TIMIT (TIMIT Acoustic-Phonetic Continuous Speech Corpus)

The TIMIT Acoustic-Phonetic Continuous Speech Corpus is a standard dataset used for evaluation of automatic speech recognition systems. It consists of recordings of 630 speakers of 8 dialects of American English each reading 10 phonetically-rich sentences. It also comes with the word and phone-level transcriptions of the speech.

Source: Improving neural networks by preventing co-adaptation of feature detectors

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Speech Recognition	TIMIT	wav2vec 2.0
Speech Separation	TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref
Speech Enhancement	TCD-TIMIT corpus (mixed-speech)	SEMamba
Speaker-Specific Lip to Speech Synthesis	TCD-TIMIT corpus (mixed-speech)	Lip2Wav
Lip Reading	TCD-TIMIT corpus (mixed-speech)	Lip2Wav