Speech to text

Requirements

  • Use setVideoContainer method in the player.
  • Speech Recognition API with support for start(MediaStreamTrack audioTrack) method.

Note: It is necessary that the required APIs are present and functional.

Optional requirements

How does it work?

Using the Web Audio API, the audio track is passed to the Speech Recognition module, which returns what was said in the audio at that moment.

There are two options here, depending on the configuration:

  • Display the text as is
  • If a translation was chosen, the text is sent to the Translator module, which returns the translation.

When this module is activated, only Speech to Text is used by default. If you need it to be translated, you must specify the languages in the configuration.

The text is rendered inside a container whose class is 'shaka-speech-to-text-container' created inside videoContainer.

The text is truncated by default, and the number of characters can be configured with streaming.speechToText.maxTextLength.

Configuration

  • enabled: Enable this module.
  • maxTextLength: Number of characters before truncation.
  • processLocally: Indicates a requirement that the speech recognition process MUST be performed locally on the user’s device. If set to false, the user agent can choose between local and remote processing. Note: remote processing is done by the browser and we have no control over what 3rd parties are involved.
  • languagesToTranslate: List of languages to translate into.

How to differentiate these tracks

All these tracks have originalLanguage equal to speech-to-text.

Track without any translation has language equal to ''.

When a track is translated it has language it is translated into.

Why don't I see the text track that the translations should have?

The browser must support Translator API, if it does not support it, the tracks will not be created since it is not possible to use this part of this module.

Why don't I see the translation?

The translation module must support both the input and output languages. If it doesn't, then nothing will be displayed.