Tag Archives: speech recognition

How to get my speech recognized with Google ASR and python

What you need to do this at first is to get yourselg a Google API key,

  • you need to register with Google speech APIs, i.e. get a Google cloud platform account
  • you need to share payment details, but (at the time of writing, i think) the first 60 minutes of processed speech per month are free.

I export my API key each time I want to use this like so:

export GOOGLE_APPLICATION_CREDENTIALS="/home/felix/data/research/Google/api_key.json"

This tutorial assumes you did that and you started a Jupyter notebook . If you don't know what this is, here's a tutorial on how to set one up (first part)

Bevor you can import the Google speech api make shure it's installed:

!pip  install google-cloud 

Then you would import the Google Cloud client library

from google.cloud import speech
import io

Instantiate a client

client = speech.SpeechClient()

And load yourself a recorded speech file, should be wav format 16kHz sample rate

speech_file = '/home/felix/tmp/google_speech_api_test.wav'
with io.open(speech_file, "rb") as audio_file:
    content = audio_file.read()

get yourself an audio object

audio = speech.RecognitionAudio(content = content)

Configure the ASR

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="de-DE",
)

Detects speech in the audio file

response = client.recognize(config=config, audio=audio)

and show what you got (with my trial only the first alternative was filled):

for result in response.results:
    for index, alternative in enumerate(result.alternatives):
        print("Transcript {}: {}".format(index, alternative.transcript))