SAPI 5 and WSR engine back-end
Dragonfly can use the built-in speech recognition included with Microsoft Windows Vista and above: Windows Speech Recognition (WSR). If WSR is available on the machine, then no extra installation needs to be done. Dragonfly can find and communicate with WSR using standard COM communication channels.
If you would like to use Dragonfly command-modules with WSR,
then you must run a loader program which will load and manage
the command-modules. A simple loader is available in the
dragonfly/examples/dfly-loader-wsr.py
file. When run, it
will scan the directory it’s in for files beginning with _
and ending
with .py
, then try to load them as command-modules.
A more full-featured loader is available in the
dragonfly/examples/wsr_module_loader_plus.py
file. It includes a
basic sleep/wake grammar to control recognition (simply say “start
listening” or “halt listening”), along with a rudimentary user interface
via sound effects and console text (easily modified in the file). It
otherwise operates like the above loader.
You can download Dragonfly’s module loaders and other example files in
dragonfly/examples
from the source code repository.
Dragonfly interfaces with this speech recognition engine using Microsoft’s Speech API version 5. This is why it is referred to in many places by “SAPI” or “SAPI 5” instead of WSR.
Engine Configuration
This engine can be configured by passing (optional) keyword arguments to
the get_engine()
function, which passes them to the engine constructor
(documented below). For example:
engine = get_engine("sapi5inproc",
retain_dir="C:/sapi5_recordings",
)
The engine can also be configured via the command-line interface:
# Initialize the SAPI 5 engine back-end with custom arguments, then load
# command modules and recognize speech.
python -m dragonfly load _*.py --engine sapi5inproc --engine-options \
retain_dir="C:/sapi5_recordings"
Engine API
Speech recognition engine back-end for SAPI 5 shared recognizer.
- Parameters:
retain_dir (str|None) –
Retains recognized audio and/or metadata in the given directory, saving audio to
retain_[timestamp].wav
file and metadata toretain.tsv
.Disabled by default (
None
).
Activate the given grammar.
Activate the given rule.
Connect to back-end SR engine.
Deactivate the given grammar.
Deactivate the given rule.
Disconnect from back-end SR engine.
Mimic a recognition of the given words.
Note
This method has a few quirks to be aware of:
Mimic can fail to recognize a command if the relevant grammar is not yet active.
Mimic does not work reliably with the shared recognizer unless there are one or more exclusive grammars active.
Mimic can crash the process in some circumstances, e.g. when mimicking non-ASCII characters.
Set the exclusiveness of a grammar.
Speak the given text using text-to-speech.
- class Sapi5InProcEngine(retain_dir=None)[source]
Speech recognition engine back-end for SAPI 5 in process recognizer.
- Parameters:
retain_dir (str|None) –
Retains recognized audio and/or metadata in the given directory, saving audio to
retain_[timestamp].wav
file and metadata toretain.tsv
.Disabled by default (
None
).
- connect(audio_source=0)[source]
Connect to the speech recognition backend.
The audio source to use for speech recognition can be specified using the audio_source argument. If it is not given, it defaults to the first audio source found.
- get_audio_sources()[source]
Get the available audio sources.
This method returns a list of audio sources, each represented by a 3-element tuple: the index, the description, and the COM handle for the audio source.
- select_audio_source(audio_source)[source]
Configure the speech recognition engine to use the given audio source.
- The audio source may be specified as follows:
As an int specifying the index of the audio source to use
As a str containing the description of the audio source to use, or a substring thereof
The
get_audio_sources()
method can be used to retrieve the available sources together with their indices and descriptions.