Text-dependent methods

Text-dependent methods are typically based on template-matching techniques. During the enrollment phase, the individual says a key-word or phrase which is captured using a microphone that can be as simple as a telephone. Then the voice sample is converted to a digital format, and a model is created on the extracted voice features.

Text-dependent recognition systems can be divided into two types:

• systems that use fixed text to be spoken.

The highest accuracies can be achieved with this type of system. It has the advantage that the system designer can devise a text which highlights speaker differences. The disadvantage is that such systems are vulnerable to impostors, because the text is always the same. Since all users have to remember some complex text, it is not very user friendly.

• systems that use pass phrases.

The user can pick a phrase during the enrollment phase, and must use the same phrase during the test phase. Most speaker verification applications use this type of text-dependent input. It has the advantage that an impostor must know the pass phrase, which adds a level of security. The disadvantage is that these systems are vulnerable to tape recorder attacks. Many systems allow the specification of several pass phrases that can be answers to questions, for example. In the verification phase the system randomly asks the user one of the questions and the user must correctly answer. However, a patient attacker could still attempt a tape recorder attack, because the number of different questions is usually limited.

Most text-dependent speaker verification systems use the concept of Hidden Markov Models (HMMs). These are random based models providing a statistical representation of the sounds produced by the individual. The HMM represents the variations over time found in the speech states using the quality, duration, intensity dynamics.

