Home Technologies ASR Automatic Speech Recognition

Automatic Speech Recognition (ASR)

What is Automatic Speech Recognition?

Automatic Speech Recognition (ASR) is the term given to the technology used to transcribe spoken words into written text.

Ubiqus uses one form of ASR – Large Vocabulary Continuous Speech Recognition (LVCSR) – based on the automatic identification of very short audio sequences. This technology makes it possible to produce an extremely high quality transcription, providing that the recording used has been made correctly. ASR has seen significant developments in recent years, and our R&D team is contributing to its continual growth.

Our working method means that we can handle not only recordings containing non-specialised vocabulary, but also those that include more specific terms (technical, legal, medical, etc).

The production of a final transcription involves a 4-step process:

1 | Voice Activity Detection

Firstly, it is important to identify when talking/speech is present during the recording, in order to cut the soundtrack into segments. The machine will then work on each of these segments.

2 | Diarization

Next, we need to identify the different speakers in each recording, and to group them into segments according to their identity, solving the problem of ‘who speaks when?’ For this, the machine uses different models containing specific data (languages, voice). In this way, it can differentiate the subtleties of a language (such as accents, for example). Note that at this point, we are still processing the data in a “mathematical” way.

3 | Decoding

This is when the actual transcription starts. A list of possible syllables (phonemes) is established for each audio segment. For now, no full sentences have been generated only one long list of possibilities, each with a score.

4 | Rescoring

The computer chooses, from all the phonemes and words learned during the initial phase, those that are likely to form the most accurate sentence (it’s a little like the way a GPS identifies the best route). It is this sentence that is transcribed into the document.

This process is applied to every segment of the recording. The final result is a complete transcription.

At the end of this automated process, the document is re-read by our teams, in the same way as any other Ubiqus document: in addition to checking the content as a whole, the proofreader will also ensure the speech has been correctly attributed.

To learn more about our automatic speech recognition software, contact us for more information

Combining technology and human know-how at Ubiqus

Are you used to the quality of Ubiqus documents and the idea of testing automatic transcription is tempting? Give it a go! The standard quality level of an automatic transcription remains as high as that of a “traditional” transcription. And in any case, once the automatic transcription has been carried out, a human translator proofreads the transcription…just as they would for a traditional transcription!