Voice signals and real structures


My approach viewing digitized speech as a structure or real structure will be schematically described as follows. In solid state physics, solid state chemistry and especially in crystallography, we distinguish between ideal structures and real structures (meant in this content is the arrangement of atoms or their positions in a crystalline solid). An ideal structure is composed of periodic arrangements of atoms (elementary cells) in all directions in space (x,y,z). A 3-D elementary cell of copper-gold (Cu3Au) is shown in fig.1. For simplicity and further discussion, the structures are displayed as two-dimensional projections on the XY-plane (fig.2). An ideal structure of Cu3Au is schematically shown in fig.3. This structure is built the same in all 3 directions in space. At higher temperatures (420°C), the Cu-atoms and the Au-atoms change their sites in the structure by diffusion over vacancies and different structure configurations occur, as shown in fig.2. When the process of changing places has finished, a structure results which incorporates all structural variants from fig.2. These structures are called disordered structures or real structures (fig.4).


A one-dimensional fourier-transformation (frequency domain) of the ideal structure from fig.3 (Cu3Au) is shown in fig.5 and below in fig.6 the corresponding fourier-transformation of the disordered or real structure of Cu3Au is represented. The maxima of the ideal structure are discrete. contrary to this, the maxima of the real structure are broadened or diffuse. The diffusity of the maxima is caused by disturbances in the periodicity of the atomic positions compared to the ideal structure.


Now we come to the approach viewing speech as a real structure. A speech signal was recorded using a high-class microphone, digitized on the computer and displayed in fig.7a. Such a speech signal is regarded as a real structure. From the viewpoint of real structure research, this speech signal or structure incorporates all characteristics of a specific human voice. These characteristics (e.g. frequency positions, diffusity, signal changes over time, the correlation in the signal (vectors)) supply a voice profile for the speaker, if it is mathematically correctly extracted from the speech signal or structure. To asses all characteristics of the voice, the signal must have a representative length. A representative length is achieved with a speech duration of 40-80 seconds. In order to compare the speech signals with a real structure, it is fourier transformed (frequency domain) from fig.7a to fig.7b. Here the similarity to real structures is obviously seen. The maxima are partly diffuse and especially the diffuse components deliver valuable information on the voice characteristic.