Text-independent methods

The text-independent method does not rely on a specific text being spoken. Text-independent systems are typically used for speaker identification as they require no cooperation by the speaker. The text during enrollment and verification is different. The enrollment may happen without the user's knowledge.

Since text-independent systems do not know the text being spoken only general speaker-specific properties of the speaker's voice can be used. This limits the accuracy of the recognition.

For this approach the following four kinds of methods have been investigated:

• Long-Term-Statistics-Based Methods

• VQ-Based Methods

• Ergodic-HMM-Based Methods

• Speech-Recognition-Based Methods

Long-term sample statistics of various spectral features represent the mean and variance of spectral features over a series of utterances.

In VQ (vector quantization)-based method, speaker-specific features are characterized by means of VQ codebooks consisting of a small number of representative feature vectors. A speaker-specific codebook is generated by grouping the training feature vectors of each speaker. In the testing phase, an input utterance is vector-quantized using the codebook of each reference speaker. The VQ distortion accumulated over the entire input utterance is used to make the recognition decision.

In the ergodic-HMM-based method, the temporal variation in speech signal parameters is represented by stochastic Markovian transitions between states. In order to classify speech segments into one of the broad phonetic categories corresponding to the HMM states, this method uses a multiple-state ergodic HMM. It means that all possible transitions between states are allowed.

In the speech-recognition-based methods, phonemes or phoneme-classes are recognized, and then each phoneme/phoneme-class segment in the input speech is compared with speaker templates corresponding to that phoneme/phoneme-class.


