Get AWS Polly to talk to you from Python

"Polly" is an ingeniously named AWS service that allows you get convert text to a sound file. It is part of the AI portfolio. Another example that we will see in another post is "Rekognition".

Long time ago if you wanted to do these kind of tasks you would have to build your own models with lots of samples, train them, improve them and so on. It is impossible to cater for every AI use case in this fashion but the ones related to human senses such as speech, hearing and vision are being already offered by most cloud providers today. These services are conveniently accessed by an API

Although the service has the name of a parrot it provides multiple voices with different accents. Each voice has a different person name, ex: "Joanna". Surprisingly it even includes some children voices in some languages. It would be awesome if they also offered a good strong Irish accent ... it would make a good addition to a joke-telling app running in a smart beer fridge :)

In order to access the Polly service with Python you will need the following:
  • an AWS account
  • AWS access key and secret key
  • Boto3 module
If you have an AWS account, chances are you have been using S3 or EC2 and you already have a key pair. Boto is the AWS SDK for Python. It is used to access all AWS services. The latest version is Boto3. AWS official statement is that this version is now stable and it should be go-to version. I couldn't confirm this but I get the feeling that AI services are only available with Boto3. You can install it with "pip install boto3"

As you can see in the repo, the Python code is pretty simple:
  • Create a service binding (Polly in this instance). This step is the same for all services
  • Synthesize the speech
  • Save the audio stream to a file so that we can play it

polly = boto3.client('polly')

response = polly.synthesize_speech(
    OutputFormat='mp3',
    Text=talk_to_me,
    TextType='text',
    VoiceId='Joanna')

Find the full code in this Github repo:


The hardest part was how to play the MP3 file in Windows. The natural choice in Python is "pygame" but apparently doesn't have good support for MP3 on Windows and I didn't manage to hear anything. So I ended up invoking VLC with the "subprocess" module. I also managed to find a handy "--play-and-exit" flag to close it right away but failed to find a flag to run it minimized. Important also for Windows users don't forget to specify the full path to both the MP3 file and the VLC executable using "\\" syntax. Take a look at the file in Github to see what I mean.


Overall Polly doesn't have huge amount of options. You can see more details about other features like "lexicon" in the following page:


Enjoy

Comments

  1. There is a vlc module for python that allows you to access vlc programmatically.

    ReplyDelete
  2. This article is a creative one and the concept is good to enhance our knowledge. Waiting for more updates.
    AWS Online Training
    Data Science Online Course

    ReplyDelete
  3. Really nice blog. thanks for sharing such a useful information.
    Kotlin Online Course

    ReplyDelete
  4. Great blog.thanks for sharing such a useful information
    Informatica Training in Chennai

    ReplyDelete

  5. This post is so interactive and informative.keep update more information...
    Tally Course in Tambaram
    Tally course in Chennai

    ReplyDelete
  6. I really enjoyed reading your blog post about integrating AWS Polly with Python to convert text into sound files. Your explanation is well-structured and provides a clear step-by-step guide for readers who are interested in leveraging this service.
    Python Career Opportunities: Why Python Programming is an Ideal Career Option

    ReplyDelete

Post a Comment

Popular posts from this blog

Sending PowerStore alerts via SNMP

Electronic Nose - eNose

Use Vagrant to deploy to AWS