Voice analysis or voice comparison

Voice analysis (voice comparison report)

Human speech consists of density waves (longitudinal waves) or soundwaves, which propagate throughout a medium (air). Such soundwaves have variable amplitudes and frequencies, the amplitudes determine the volume and the frequencies the pitch of the voice. These values and their changes over time contain combinations of parameters, which are specific to every single person. Amongst other things, they reveal information about the features of the vocal chords and the whole vocal tube, which are characteristic for every single voice. From a mathematic point of view, speech develops from the convolution of two signals, coming from the vocal chords and the vocal tract. The psychoacoustic theory deals with the formation of speech, as well as the functional theory of human hearing. The acquired knowledge from psychoacoustic theory and real structure research is taken into account in calculating voice characteristics. This leads to the advantage, that such calculations are done very close to the actual functional model of the human hearing and will outperform the accuracy of an undisturbed human ear in differentiating voices by far. Unlike the phonetic method, such calculations are not influenced by a subjective impression or human error (see examples).The procedure for evaluating voices is as follows:
Sound waves (speech signals) are digitized and displayed as a computer graphic or structure. Such a representation is done on a time scale, making it possible to observe the form of the wave train (pitch and duration). A conclusion on frequency range or the number of occurrences of specific frequencies can only be made after a transformation of the time scale into the frequency scale. In mathematics, this is realized by so called discrete fourier transformations with the aid of a computer. Through mathematical operations, several parameters of voice are extracted from the frequency spectrum, for example the correlation vectors and the separation into the critical frequency bands. Subsequently, a voice print is calculated from the extracted parameters, which is characteristic for the voice of a speaker.

In creating a voice comparison report there are usually two voice samples available, voice 1 and voice 2. Voice 1 is the blank sample, voice 2 is the target sample. Both speech signals are analyzed as described above and the voice profiles are calculated. For a graphical representation of the results the voice profiles can be displayed, as well as time-, frequency and amplitude-diagrams. Analyzing the curve progression of the profiles from voice 1 and voice 2, it is possible to calculate a percentage of correlation (matching percentage) between the two (the matching scale theoretically goes from 0% to 100%). In this calculation, the quality of transmission of the speech signals (transmission- or transferfunction) is also taken into account. For example, determining the atomic structure of DNA (by fine structure X-ray analysis), the same principles of matching percentages are used, to express the accuracy of the structural data compared to the theoretical model.
In cases where repetitions of sentences or words occur in both voice samples, two other parameters, loudness and roughness of the voice, can be calculated and taken into account in the complete analysis. In some cases, several voice recordings from presumably identical persons exist, which have a specific minimum length. These recordings can be analyzed and compared as described above.
In general, with this technique, it is totally irrelevant which language is spoken and the spoken text does not have to be the same. The voice signals are solely viewed as a structure, enabling the use of established methods from real structure research, spectral analysis and psychoacoustics.