Giving AIoT a Voice using Azure AI

In my previous article we learnt together how to use Azure Face API to enable Raspberry Pi IoT device to detect faces. In this second howto article we are going to build atop of what we have done so far to give Raspberry Pi a voice using Azure Speech API.

In order to be able to follow through the steps detailed here you will need to setup a Raspberry Pi following the instructions from the previous article.

When your IoT device is ready, you will need to create a Speech API on your Azure portal. For convenience, I’m going to use bash commands. To open your bash console you will need to open the cloud shell from the portal upper right corner.

And when the shell pane show up you will need to select the bash option like shown below. You might need to create an Azure storage account first time you use the Azure cloud shell to host your shell files. Azure will create this automatically for you.

We will start by creating a resource group to host the components required by this howto and the future ones. Here I’m group called AIoT which will be located in northeurope location, you may pick a different location.

az group create --name AIoT --location northeurope

You should get a json success message as following

fady@Azure:~$ az group create --name AIoT --location northeurope
"id": "/subscriptions/GUID/resourceGroups/AIoT",
"location": "northeurope",
"managedBy": null,
"name": "AIoT",
"properties": {
"provisioningState": "Succeeded"
"tags": null,
"type": "Microsoft.Resources/resourceGroups"

Now we proceed to create the new Speech API resource named AIoTSpeech under the resource group we just created as following. The F0 sku means you are using a free tier which you can use only once.

az cognitiveservices account create -n AIoTSpeech -g AIoT --kind SpeechServices --sku F0 -l northeurope --yes

You should get another success json message like below

fady@Azure:~$ az cognitiveservices account create -n AIoTSpeech -g AIoT --kind SpeechServices --sku F0 -l northeurope --yes
"customSubDomainName": null,
"endpoint": "",
"etag": "\"GUID\"",
"id": "/subscriptions/GUID/resourceGroups/AIoT/providers/Microsoft.CognitiveServices/accounts/AIoTSpeech",
"internalId": "GUID",
"kind": "SpeechServices",
"location": "northeurope",
"name": "AIoTSpeech",
"networkAcls": null,
"provisioningState": "Succeeded",
"resourceGroup": "AIoT",
"sku": {
"name": "F0",
"tier": null
"tags": null,
"type": "Microsoft.CognitiveServices/accounts"

After done, you will need to grab the API keys so you can use them to call the service. You can do so by typing the below command with resources group name and resource name as parameters.

az cognitiveservices account keys list -g AIoT -n AIoTSpeech

You will get a json response message with key 1 & 2, any of them should work.

Now your Azure Speech service is ready, let’s prepare our Raspberry Pi by installing the nodejs npm packages required to call the API, download the generated voice wav file and play it on Raspberry Pi sound system.

npm install microsoft-cognitiveservices-speech-sdk fs play-sound

After done, create a new file named speech.js and copy and paste the below nodejs code snippet in it

// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
var fs = require("fs");
var player = require('play-sound')(opts = {})

var subscriptionKey = "key"; //place your subscription key here
var serviceRegion = "northeurope"; // place your azure location here
var filename = "hello.wav"; // This is the file name which is going to be downloaded

// create the pull stream we need for the speech sdk.
var pullStream = sdk.AudioOutputStream.createPullStream();

// open the file and write it to the pull stream.
fs.createWriteStream(filename).on('data', function(arrayBuffer) {;
}).on('end', function() {

// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);

// setting the recognition language to English.
speechConfig.speechRecognitionLanguage = "en-US";

var audioConfig = sdk.AudioConfig.fromStreamOutput(pullStream);

// create the speech synthesizer.
var synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

// we are done with the setup
var text = "Hello World!"
console.log("Now sending text '" + text + "' to: " + filename);

// start the synthesizer and wait for a result.
  function (result) {

    //when the wav file is ready, play it'hello.wav', function(err){
        if (err) throw err
    synthesizer = undefined;
  function (err) {
    console.trace("err - " + err);

    synthesizer = undefined;

Copy the file same way explained in the previous article to your Raspberry Pi, connect a speaker or headphones to either your Pi stereo jack or one of its usb ports. Then from the shell, type the below command.

node speech.js

You should now hear the message “Hello Wrold!”. Now you had just created an AIoT device which has a voice and can see faces. Hope you did enjoy this howto article. Stay tuned for the next one where we will continue to learn together about AIoT by building atop of what we have achieved so far to make our Raspberry Pi IoT device more smart and interactive using Azure cloud cognitive services.

Leave a Reply

Your email address will not be published. Required fields are marked *