Asterisk External Application Protocol: Speech to Text Engine

Asterisk External Application Protocol: Speech to Text Engine

The Asterisk External Application Protocol (AEAP) has been released and can now be found in Asterisk 18.12.0+ and 19.4.0+. If you haven’t already done so, be sure to check out the introduction article to AEAP. For the most current, and up to date information regarding the protocol please see the wiki.

The AEAP framework API is meant to be extensible and easy to use. Those wishing to create their own Asterisk module using AEAP, or wanting to extend the framework should checkout res_aeap.h and res_aeap_message.h. Associated configuration is located in aeap.conf. You can also see a working implementation in res_speech_aeap.c.

Asterisk External Speech to Text Engine

Employing the AEAP, Asterisk also now supports external speech to text applications written in a programmer’s language of choice. It does so using the speech to text engine module found in res_speech_aeap.c. Note, the configured protocol option in aeap.conf must be set to “speech_to_text”.

The module uses the protocol as is but does use a set of fixed, as well as custom “params” depending on the message type when sending requests:

“setup”

Upon client connection a setup request is sent from Asterisk. There are no fixed parameters for this message type, only custom ones. Custom “params” can be configured in aeap.conf for a specified client type.

Currently, Asterisk expects only a single codec in the response. Or at the very least if the remote application replies with more than one only the first is selected. If the name of the codec in the response does not match that of the one in the request then “setup” will fail.

“set”

Asterisk may send a set request any time after a successful setup. Custom “params” can be any name value pair passed using the speech to text dialplan attribute function. As well the following fixed “params” may be sent:

  • “dtmf” –  The DTMF string value received.
  • “results_type” – The desired type of results, “normal” or “nbest”

    “get”

    Asterisk may send a get request any time after a successful setup. Custom “params” can be any name passed using the speech to text dialplan attribute function. Also, the following fixed “params” may be sent:

    • “results” – A request for the speech results

    Returned values in the get response are typed based on the given param. For speech to text all requested param values are expected to be a string type except for “results”. “results” are expected to be in the following JSON array/object format:

    [
        {
            “text”: “<the speech to text data>”,
            “score”: <integer score value>,
            “grammar”: “<the grammar>”,
            “nbest_num”: <integer best number>,
        },
        …,
    ]

    One to ‘N’ results can be returned. As well, any field is optional. However, it’d probably be a bit silly to return a result without “text”.

    So far we’ve only talked about Asterisk sending requests, and handling of their responses. Asterisk will also handle receiving requests of the following type:

    “set”

    This is the only request type the Asterisk speech to text external engine is programmed to accept. As well, the only recognized “params” is “results”. The “results” should be in the same JSON array/object format as mentioned above.

    Example

    An example external JavaScript application that makes use of the protocol has also been created, and can be found here. While a bit basic the application is fully functional, and should be able to be easily expanded upon. See the project’s README.md for installation and usage instructions.

    About the Author

    What can we help you find?