Get AWS Polly to talk to you from Python

By Al December 22, 2017

"Polly" is an ingeniously named AWS service that allows you get convert text to a sound file. It is part of the AI portfolio. Another example that we will see in another post is "Rekognition".

Long time ago if you wanted to do these kind of tasks you would have to build your own models with lots of samples, train them, improve them and so on. It is impossible to cater for every AI use case in this fashion but the ones related to human senses such as speech, hearing and vision are being already offered by most cloud providers today. These services are conveniently accessed by an API

Although the service has the name of a parrot it provides multiple voices with different accents. Each voice has a different person name, ex: "Joanna". Surprisingly it even includes some children voices in some languages. It would be awesome if they also offered a good strong Irish accent ... it would make a good addition to a joke-telling app running in a smart beer fridge :)

In order to access the Polly service with Python you will need the following:

an AWS account
AWS access key and secret key
Boto3 module

If you have an AWS account, chances are you have been using S3 or EC2 and you already have a key pair. Boto is the AWS SDK for Python. It is used to access all AWS services. The latest version is Boto3. AWS official statement is that this version is now stable and it should be go-to version. I couldn't confirm this but I get the feeling that AI services are only available with Boto3. You can install it with "pip install boto3"

As you can see in the repo, the Python code is pretty simple:

Create a service binding (Polly in this instance). This step is the same for all services
Synthesize the speech
Save the audio stream to a file so that we can play it

polly = boto3.client('polly')

response = polly.synthesize_speech(
    OutputFormat='mp3',
    Text=talk_to_me,
    TextType='text',
    VoiceId='Joanna')

Find the full code in this Github repo:

https://github.com/cermegno/Python-AWS-Polly

The hardest part was how to play the MP3 file in Windows. The natural choice in Python is "pygame" but apparently doesn't have good support for MP3 on Windows and I didn't manage to hear anything. So I ended up invoking VLC with the "subprocess" module. I also managed to find a handy "--play-and-exit" flag to close it right away but failed to find a flag to run it minimized. Important also for Windows users don't forget to specify the full path to both the MP3 file and the VLC executable using "\\" syntax. Take a look at the file in Github to see what I mean.