Freeswitch Text-To-Speech Caching with Cepstral and LUA

November 13, 2011

Recently I have been working on a project using software called Freeswitch, which is an excellent open source SIP server.

The project required the use of a text-to-speech (TTS) speech engine called Cepstral.

However Cepstral's product suffers with concurrency problems when used with many concurrent phone calls. Additionally there is about a 1 second delay before TTS audio actually starts to play, which can be off-putting for the callers.

To overcome these issues I have implemented a caching mechanism using Freeswitch's built in integration with the LUA scripting language.

Our system tends to 'say' the same things over and over again, so by caching the TTS output to a wav file this allowed Freeswitch to just play back the sound file, rather than generate the same audio over and over again.

The TTS Cache Script

The script below should be installed into the scripts directory in Freeswitch, commonly /opt/freeswitch/scripts/tts_cache.lua.

-- This script generates a wav file of the sentence passed in to it.
-- It uses the Cepstral swift command to perform text-to-speech conversion.
-- If the wav file already exists for this sentence, then it is not
generated.
api         = freeswitch.API();
msg         = argv[1];
msgMd5      = api:execute( "md5", msg );
filename    = '/var/lib/tts_cache/' .. msgMd5 .. '.wav';
cmd         = '/opt/swift/bin/swift';

-- Set a channel variable so that we know which file to play back.
session:setVariable( 'tts_file', filename );

-- Check whether the file already exists.
file, errMsg = io.open( filename, "r" )
if not file then
api:execute( 'system', cmd .. ' -o "' .. filename .. '" "' .. msg .. '"' );
end

Using The TTS Cache Script

To use the script, first create a directory to save the cache files in and ensure Freeswitch can write to it:

mkdir /var/lib/tts_cache
chown freeswitch /var/lib/tts_cache

Next, create a phrase macro to allow you use it within a dial plan or IVR setting, commonly this goes in /opt/freeswitch/conf/lang/en/ivr/tts_cache.xml:

<include>
  <!--Provides a phase to speak custom text-->
  <macro name="tts_cache">
    <input pattern="(.*)">
      <match>
        <action function="execute" data="lua(tts_cache.lua '$1')"/>
        <action function="play-file" data="${tts_file}" />
      </match>
    </input>
  </macro>
</include>

Finally, in your dial plan you can use this script as so:

<action name="phrase" data="tts_cache,Hello World" />

Now the first time the phrase "Hello World" is requested, it is passed into the Cepstral swift command, which generates a wav file, and then when ever the same phrase "Hello World" is requested in the future, Freeswitch will just playback the wav file, which is much quicker.