Engines sub-package

Dragonfly supports multiple speech recognition engines as its backend. The engines sub-package implements the interface code for each supported engine.

There are separate pages on the Kaldi and CMU Pocket Sphinx engines:

EngineBase class

The dragonfly.engines.engine_base.EngineBase class forms the base class for this specific speech recognition engine classes. It defines the stubs required and performs some of the logic necessary for Dragonfly to be able to interact with a speech recognition engine.

class EngineBase[source]

Base class for engine-specific back-ends.

connect()[source]

Connect to back-end SR engine.

connection()[source]

Context manager for a connection to the back-end SR engine.

create_timer(callback, interval, repeating=True)[source]

Create and return a timer using the specified callback and repeat interval.

disconnect()[source]

Disconnect from back-end SR engine.

do_recognition(begin_callback=None, recognition_callback=None, failure_callback=None, end_callback=None, post_recognition_callback=None, *args, **kwargs)[source]

Recognize speech in a loop until interrupted or disconnect() is called.

Recognition callback functions can optionally be registered.

Extra positional and key word arguments are passed to _do_recognition().

Parameters:
  • begin_callback (callable | None) – optional function to be called when speech starts.
  • recognition_callback (callable | None) – optional function to be called on recognition success.
  • failure_callback (callable | None) – optional function to be called on recognition failure.
  • end_callback (callable | None) – optional function to be called when speech ends, either successfully (after calling the recognition callback) or in failure (after calling the failure callback).
  • post_recognition_callback (callable | None) – optional function to be called after all rule processing has completed.
grammars

Grammars loaded into this engine.

language

Current user language of the SR engine. (Read-only)

Return type:str
mimic(words)[source]

Mimic a recognition of the given words.

name

The human-readable name of this engine.

process_grammars_context(window=None)[source]

Enable/disable grammars & rules based on their current contexts.

This must be done preemptively for some SR engine back-ends, such as WSR, that don’t apply context changes upon/after the utterance start has been detected. The WSR engine should call this automatically whenever the foreground application (or its title) changes. The user may want to call this manually to update when custom contexts.

The window parameter is optional window information, which can be passed in as an optimization if it has already been gathered.

quoted_words_support

Whether this engine can compile and recognize quoted words.

Return type:bool
recognise_forever(begin_callback=None, recognition_callback=None, failure_callback=None, end_callback=None, post_recognition_callback=None, *args, **kwargs)

Alias of do_recognition() left in for backwards-compatibility

recognize_forever(begin_callback=None, recognition_callback=None, failure_callback=None, end_callback=None, post_recognition_callback=None, *args, **kwargs)

Alias of do_recognition() left in for backwards-compatibility

set_exclusive(grammar, exclusive)[source]

Alias of set_exclusiveness().

set_exclusiveness(grammar, exclusive)[source]

Set the exclusiveness of a grammar.

speak(text)[source]

Speak the given text using text-to-speech.

Engine backends

Main SR engine back-end interface

get_current_engine()[source]

Get the currently initialized SR engine object.

If an SR engine has not been initialized yet, None will be returned instead.

Return type:EngineBase | None
Returns:engine object or None

Usage example:

# Print the name of the current engine if one has been
# initialized.
from dragonfly import get_current_engine
engine = get_current_engine()
if engine:
    print("Engine name: %r" % engine.name)
else:
    print("No engine has been initialized.")
get_engine(name=None, **kwargs)[source]

Get the engine implementation.

This function will initialize an engine object using the get_engine and is_engine_available functions in the engine packages and return an instance of the first available engine. If one has already been initialized, it will be returned instead.

Parameters:
  • name (str) – optional human-readable name of the engine to return.
  • **kwargs – optional keyword arguments passed through to the engine for engine-specific configuration.
Return type:

EngineBase

Returns:

engine object

Raises:

EngineError

register_engine_init(engine)[source]

Register initialization of an engine.

This function sets the default engine to the first engine initialized.

SR back-end package for WSR and SAPI 5

The WSR / SAPI 5 back-end has two engine classes:

  • sapi5inproc - engine class for SAPI 5 in process recognizer. This is the default implementation and has no GUI (yet). get_engine() will return an instance of this class if the name parameter is None (default) or "sapi5inproc". It is recommended that you run this from command-line.
  • sapi5shared - engine class for SAPI 5 shared recognizer. This implementation uses the Windows Speech Recognition GUI. This implementation’s behaviour can be inconsistent and a little buggy at times, which is why it is no longer the default. To use it anyway pass "sapi5" or "sapi5shared" to get_engine().
get_engine(name=None, **kwargs)[source]

Retrieve the Sapi5 back-end engine object.

Parameters:
  • name (str) – optional human-readable name of the engine to return.
  • **kwargs – optional keyword arguments passed through to the engine for engine-specific configuration.
Keyword Arguments:
 
  • retain_dir (str) – directory to save audio data:
    A .wav file for each utterance, and retain.tsv file with each row listing (wav filename, wav length in seconds, grammar name, rule name, recognized text) as tab separated values.
is_engine_available(name, **kwargs)[source]

Check whether SAPI is available.

Parameters:
  • name (str) – optional human-readable name of the engine to return.
  • **kwargs – optional keyword arguments passed through to the engine for engine-specific configuration.
class Sapi5SharedEngine(retain_dir=None)[source]

Speech recognition engine back-end for SAPI 5 shared recognizer.

connect()[source]

Connect to back-end SR engine.

disconnect()[source]

Disconnect from back-end SR engine.

class Sapi5InProcEngine(retain_dir=None)[source]

Speech recognition engine back-end for SAPI 5 in process recognizer.

connect(audio_source=0)[source]

Connect to the speech recognition backend.

The audio source to use for speech recognition can be specified using the audio_source argument. If it is not given, it defaults to the first audio source found.

get_audio_sources()[source]

Get the available audio sources.

This method returns a list of audio sources, each represented by a 3-element tuple: the index, the description, and the COM handle for the audio source.

select_audio_source(audio_source)[source]

Configure the speech recognition engine to use the given audio source.

The audio source may be specified as follows:
  • As an int specifying the index of the audio source to use
  • As a str containing the description of the audio source to use, or a substring thereof

This class’ method get_audio_sources() can be used to retrieve the available sources together with their indices and descriptions.

SR back-end package for Kaldi

get_engine(**kwargs)[source]

Retrieve the Kaldi back-end engine object.

is_engine_available(**kwargs)[source]

Check whether Kaldi is available.

SR back-end package for CMU Pocket Sphinx

The main interface to this engine is provided by methods and properties of the engine class. Please see the CMU Pocket Sphinx engine page for more details.

get_engine()[source]

Retrieve the Sphinx back-end engine object.

is_engine_available()[source]

Check whether the Sphinx engine is available.

SR back-end package for the text input engine

The text input engine is a convenient, always available implementation designed to be used via the engine.mimic() method.

To initialise the text input engine, do the following:

get_engine("text")

Note that dragonfly.engines.get_engine() called without "text" will never initialise the text input engine. This is because real speech recognition backends should be returned from the function by default.

All dragonfly elements and rule classes should be supported. Use all uppercase words to mimic input for Dictation elements, e.g. “find SOME TEXT” to match the dragonfly spec “find <text>”. executable, title, and handle keyword arguments may optionally be passed to engine.mimic() to simulate a particular foreground window.

Dragonfly’s command-line interface can be used to test command modules with the text input engine. See the CLI page for more details.

get_engine()[source]

Retrieve the back-end engine object.

is_engine_available()[source]

Check whether the engine is available.

Dictation container classes

Dictation container base class

This class is used to store the recognized results of dictation elements within voice-commands. It offers access to both the raw spoken-form words and be formatted written-form text.

The object can be expected to behave like a string, responding as you would expect to string methods like replace(). The formatted text can be retrieved using format() or simply by calling str(...) on a dictation container object. By default, formatting returns the words joined with spaces, but custom formatting can be applied by calling string methods on the Dictation object. A tuple of the raw spoken words can be retrieved using words.

String Formatting Examples

The following examples demonstrate dictation input can be formatted by calling string methods on Dictation elements.

Python example:

mapping = {
    # Define commands for writing Python methods, functions and classes.
    "method [<under>] <snaketext>":
        Text("def %(under)s%(snaketext)s(self):") + Key("left:2"),
    "function <snaketext>":
        Text("def %(snaketext)s():") + Key("left:2"),
    "classy [<classtext>]":
        Text("class %(classtext)s:") + Key("left"),

    # Define a command for accessing object members.
    "selfie [<under>] [<snaketext>]":
        Text("self.%(under)s%(snaketext)s"),
}

extras = [
    # Define a Dictation element that produces snake case text,
    # e.g. hello_world.
    Dictation("snaketext", default="").lower().replace(" ", "_"),

    # Define a Dictation element that produces text matching Python's
    # class casing, e.g. DictationContainer.
    Dictation("classtext", default="").title().replace(" ", ""),

    # Allow adding underscores before cased text.
    Choice("under", {"under": "_"}, default=""),
]

rule = MappingRule(name="PythonExample", mapping=mapping, extras=extras)

Markdown example:

mapping = {
    # Define a command for typing Markdown headings 1 to 7 with optional
    # capitalized text.
    "heading [<num>] [<capitalised_text>]":
        Text("#")*Repeat("num") + Text(" %(capitalised_text)s"),
}

extras = [
    Dictation("capitalised_text", default="").capitalize(),
    IntegerRef("num", 1, 7, 1),
]

rule = MappingRule(name="MdExample", mapping=mapping, extras=extras)

Camel-case example using the Dictation.camel() method:

mapping = {
    # Define a command for typing camel-case text, e.g. helloWorld.
    "camel <camel_text>": Text(" %(camel_text)s"),
}

extras = [
    Dictation("camel_text", default="").camel(),
]

rule = MappingRule(name="CamelExample", mapping=mapping, extras=extras)

Example using the Dictation.apply() method for random casing:

from random import random

def random_text(text):
    # Randomize the case of each character.
    result = ""
    for c in text:
        r = random()
        if r < 0.5:
            result += c.lower()
        else:
            result += c.upper()
    return result

mapping = {
    "random <random_text>": Text("%(random_text)s"),
}

extras = [
    Dictation("random_text", default="").apply(random_text),
]

rule = MappingRule(name="RandomExample", mapping=mapping, extras=extras)

Class reference

class DictationContainerBase(words, methods=None)[source]

Container class for dictated words as recognized by the Dictation element.

This base class implements the general functionality of dictation container classes. Each supported engine should have a derived dictation container class which performs the actual engine- specific formatting of dictated text.

A dictation container is created by passing it a sequence of words as recognized by the backend SR engine. Each word must be a Unicode string.

Parameters:
  • words (sequence-of-unicode) – A sequence of Unicode strings.
  • methods (list-of-triples) – Tuples describing string methods to call on the output.
apply_methods(joined_words)[source]

Apply any string methods called on the Dictation object to a given string.

Called during format().

format()[source]

Format and return this dictation as a Unicode object.

words

Sequence of the words forming this dictation.

Engine timer classes

Multiplexing interface to a timer

class Timer(function, interval, manager, repeating=True)[source]

Timer class for calling a function every N seconds.

Constructor arguments:

  • function (callable) – the function to call every N seconds. Must have no required arguments.
  • interval (float) – number of seconds between calls to the function. Note that this is on a best-effort basis only.
  • manager (TimerManagerBase) – engine timer manager instance.
  • repeating (bool) – whether to call the function every N seconds or just once (default: True).

Instances of this class are normally initialised from engine.create_timer().

call()[source]

Call the timer’s function.

This method is normally called by the timer manager.

start()[source]

Start calling the timer’s function on an interval.

This method is called on initialisation.

stop()[source]

Stop calling the timer’s function on an interval.

class TimerManagerBase(interval, engine)[source]

Base timer manager class.

_activate_main_callback(callback, msec)[source]

Virtual method to implement to start calling main_callback() on an interval.

_deactivate_main_callback()[source]

Virtual method to implement to stop calling main_callback() on an interval.

add_timer(timer)[source]

Add a timer and activate the main callback if required.

disable()[source]

Method to disable execution of the main timer callback.

This method is used for testing timer-related functionality without race conditions.

enable()[source]

Method to re-enable the main timer callback.

The main timer callback is enabled by default. This method is only useful if disable() is called.

main_callback()[source]

Method to call each timer’s function when required.

remove_timer(timer)[source]

Remove a timer and deactivate the main callback if required.

class ThreadedTimerManager(interval, engine)[source]

Timer manager class using a daemon thread.

This class is used by the “text” engine. It is only suitable for engine backends with no recognition loop to execute timer functions on.

Warning

The timer interface is not thread-safe. Use the enable() and disable() methods if you need finer control over timer function execution.

class DelegateTimerManager(interval, engine)[source]

Timer manager class that calls main_callback() through an engine-specific callback function.

Engines using this class should implement the methods in DelegateManagerInterface.

This class is used by the SAPI 5 engine.

class DelegateTimerManagerInterface[source]

DelegateTimerManager interface.

set_timer_callback(callback, sec)[source]

Method to set the timer manager’s callback.

Parameters:
  • callback (callable | None) – function to call every N seconds
  • sec (float | int) – number of seconds between calls to the callback function

Multiplexing interface for the CMU Pocket Sphinx engine

class SphinxTimerManager(interval, engine)[source]

Timer manager for the CMU Pocket Sphinx engine.

This class allows running timer functions if the engine is currently processing audio via one of three engine processing methods:

  • process_buffer()
  • process_wave_file()
  • recognise_forever()

Timer functions will run whether or not recognition is paused (i.e. in sleep mode).

Note: long-running timers will block dragonfly from processing what was said, so be careful with how you use them! Audio frames will not normally be dropped because of timers, long-running or otherwise.

Normal threads can be used instead of timers if desirable. This is because the main recognition loop is done in Python rather than in C/C++ code, so there are no unusual multi-threading limitations.