Language sub-package
This section documents Dragonfly’s support for spoken languages.
TODO Move engine-specific sections into the relevant engine pages. TODO Document the languages Split this file and the documentation for dragonfly.language.
Languages with speech recognition engine support
Speech recognition engines supported by Dragonfly have a set spoken
language. This language can be checked via the engine.language
property, which returns an ISO 639-1 code (e.g. “en”):
from dragonfly import get_engine
engine = get_engine()
# Print the engine language.
print("Engine language: {}".format(engine.language))
Each speech recognition engine supported by Dragonfly supports many languages. These are listed below with citations.
It is worth noting that Dragonfly’s use of ISO 639-1 language codes means
that no distinction is made between variants of languages. For example,
U.S. English and U.K. English will both yield "en"
and be treated as
the same language, even though there are some differences.
Languages supported by CMU Pocket Sphinx
The CMU Pocket Sphinx engine documentation page has a section on spoken language support. There are CMU Pocket Sphinx models and dictionaries available from Source Forge for the following languages [3]:
English (U.S.)
English (Indian)
Catalan
Chinese (Mandarin)
Dutch
French
German
Greek
Hindi
Italian
Kazakh
Portuguese
Russian
Spanish
English (U.S.) is the default language used by the CMU Pocket Sphinx engine.
Languages supported by Kaldi
The following languages are supported by the Kaldi engine back-end:
English (U.S.)
It is possible for Kaldi to support other languages in the future. This requires finding decent models for other languages and making minor modifications to enable their use by the Kaldi Active Grammar library.
You can request to have your language supported by opening a new issue or by contacting David Zurow (@daanzu) directly.
Languages with built-in grammar support
Dragonfly’s Integer
, IntegerRef
and Digits
classes have support for multiple spoken languages. Each supported language
has a sub-package under dragonfly.language
. The current engine
language will be used to load the language-specific content classes in these
sub-packages.
This functionality is optional. Languages other than those listed below can still be used if the speech recognition supports them.
The following languages are supported:
Arabic - “ar”
Dutch - “nl”
English - “en”
German - “de”
Indonesian - “id”
Malaysian - “ms”
English has additional time, date and character related classes.
Language classes reference
ShortIntegerRef
ShortIntegerRef
is a modified version of IntegerRef
which allows for greater flexibility in the way that numbers may be pronounced, allowing for words like “hundred” to be dropped. This may be particularly useful when navigating files by line or page number.
Some examples of allowed pronunciations:
Pronunciation |
Result |
---|---|
one |
1 |
ten |
10 |
twenty three |
23 |
two three |
23 |
seventy |
70 |
seven zero |
70 |
hundred |
100 |
one oh three |
103 |
hundred three |
103 |
one twenty seven |
127 |
one two seven |
127 |
one hundred twenty seven |
127 |
seven hundred |
700 |
thousand |
1000 |
seventeen hundred |
1700 |
seventeen hundred fifty three |
1753 |
seventeen fifty three |
1753 |
one seven five three |
1753 |
seventeen five three |
1753 |
four thousand |
4000 |
The class works in the same way as IntegerRef
, by adding the following as an extra.
ShortIntegerRef("name", 0, 1000),
References