• Divakar V

Azure Cognitive services using Python & REST API | Text-to-Speech

Updated: Apr 7, 2021

This post shows how to use Microsoft's Text-to-Speech neural API to generate audios from text input. Let's get started!


The objective is to write a Python script to generate audio speeches for each sentence

in a text file using Azure text-to-speech API via REST API.


The first step is to create a free azure account if you don't already have it and start the free text to speech service.

Link: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/

Once inside the portal, you can easily setup a resource group and add the text-to-speech cognitive service to it (I'm confident that you can manage it).


Your portal should look like this (image below). Now, go to the "Keys and Endpoint" tab under "Resource Management" on the left. From here, we will require these 2 information for our python code:

  1. Key - Key 1 or Key 2, pick either doesn't matter.

  2. Location - centralindia in my case




With the above information from the portal, you can set the subscription_key and location variables accordingly. You can also update the input text file and the output folder accordingly.



The following function generates an access_token using the unique subscription key of your account. Note that each access token is valid for 10 minutes.


The API also provides the option to generate audio in various formats. List of available audio output formats. You can specify the desired output format inside the request header (line 30).

You can also experiment with different languages and accents by modifying the body (data) of the request packet (line 37)


Entire Code:




References:

  1. https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech

  2. GitHub Repository - https://github.com/vdivakar/azureTTS