SAPI 5 and WSR engine back-end

Dragonfly can use the built-in speech recognition included with Microsoft Windows Vista and above: Windows Speech Recognition (WSR). If WSR is available on the machine, then no extra installation needs to be done. Dragonfly can find and communicate with WSR using standard COM communication channels.

If you would like to use Dragonfly command-modules with WSR, then you must run a loader program which will load and manage the command-modules. A simple loader is available in the dragonfly/examples/dfly-loader-wsr.py file. When run, it will scan the directory it’s in for files beginning with _ and ending with .py, then try to load them as command-modules.

A more full-featured loader is available in the dragonfly/examples/wsr_module_loader_plus.py file. It includes a basic sleep/wake grammar to control recognition (simply say “start listening” or “halt listening”), along with a rudimentary user interface via sound effects and console text (easily modified in the file). It otherwise operates like the above loader.

You can download Dragonfly’s module loaders and other example files in dragonfly/examples from the source code repository.

Dragonfly interfaces with this speech recognition engine using Microsoft’s Speech API version 5. This is why it is referred to in many places by “SAPI” or “SAPI 5” instead of WSR.

Shared vs in-process recognizers

The WSR / SAPI 5 back-end has two engine classes:

sapi5inproc - engine class for SAPI 5 in process recognizer. This is the default implementation and has no GUI (yet). get_engine() will return an instance of this class if the name parameter is None (default) or "sapi5inproc". It is recommended that you run this from command-line.
sapi5shared - engine class for SAPI 5 shared recognizer. This implementation uses the Windows Speech Recognition GUI. This implementation’s behaviour can be inconsistent and a little buggy at times, which is why it is no longer the default. To use it anyway pass "sapi5" or "sapi5shared" to get_engine().

The engine class can be selected by passing one of the above-mentioned names names via the command-line interface instead of using get_engine() directly:

# Initialize the SAPI 5 engine using the shared recognizer class.
python -m dragonfly load _*.py --engine sapi5shared

Engine Configuration

This engine can be configured by passing (optional) keyword arguments to the get_engine() function, which passes them to the engine constructor (documented below). For example:

engine = get_engine("sapi5inproc",
  retain_dir="C:/sapi5_recordings",
)

The engine can also be configured via the command-line interface:

# Initialize the SAPI 5 engine back-end with custom arguments, then load
# command modules and recognize speech.
python -m dragonfly load _*.py --engine sapi5inproc --engine-options \
    retain_dir="C:/sapi5_recordings"

Engine API

class Sapi5SharedEngine(retain_dir=None)[source]

Speech recognition engine back-end for SAPI 5 shared recognizer.

Parameters:

retain_dir (str|None) –

Retains recognized audio and/or metadata in the given directory, saving audio to retain_[timestamp].wav file and metadata to retain.tsv.

Disabled by default (None).

activate_grammar(grammar)[source]: Activate the given grammar.

activate_rule(rule, grammar)[source]: Activate the given rule.

connect()[source]: Connect to back-end SR engine.

deactivate_grammar(grammar)[source]: Deactivate the given grammar.

deactivate_rule(rule, grammar)[source]: Deactivate the given rule.

disconnect()[source]: Disconnect from back-end SR engine.

mimic(words)[source]

Mimic a recognition of the given words.

Note

This method has a few quirks to be aware of:

Mimic can fail to recognize a command if the relevant grammar is not yet active.
Mimic does not work reliably with the shared recognizer unless there are one or more exclusive grammars active.
Mimic can crash the process in some circumstances, e.g. when mimicking non-ASCII characters.

set_exclusiveness(grammar, exclusive)[source]: Set the exclusiveness of a grammar.

speak(text)[source]: Speak the given text using text-to-speech.

class Sapi5InProcEngine(retain_dir=None)[source]

Speech recognition engine back-end for SAPI 5 in process recognizer.

Parameters:

retain_dir (str|None) –

Retains recognized audio and/or metadata in the given directory, saving audio to retain_[timestamp].wav file and metadata to retain.tsv.

Disabled by default (None).

connect(audio_source=0)[source]

Connect to the speech recognition backend.

The audio source to use for speech recognition can be specified using the audio_source argument. If it is not given, it defaults to the first audio source found.

get_audio_sources()[source]

Get the available audio sources.

This method returns a list of audio sources, each represented by a 3-element tuple: the index, the description, and the COM handle for the audio source.

select_audio_source(audio_source)[source]

Configure the speech recognition engine to use the given audio source.

The audio source may be specified as follows:

As an int specifying the index of the audio source to use
As a str containing the description of the audio source to use, or a substring thereof

The get_audio_sources() method can be used to retrieve the available sources together with their indices and descriptions.