Tag Archives: speech recognition

How to get my speech recognized with Google ASR and python

What you need to do this at first is to get yourselg a Google API key,

  • you need to register with Google speech APIs, i.e. get a Google cloud platform account
  • you need to share payment details, but (at the time of writing, i think) the first 60 minutes of processed speech per month are free.

I export my API key each time I want to use this like so:

export GOOGLE_APPLICATION_CREDENTIALS="/home/felix/data/research/Google/api_key.json"

This tutorial assumes you did that and you started a Jupyter notebook . If you don't know what this is, here's a tutorial on how to set one up (first part)

Bevor you can import the Google speech api make shure it's installed:

!pip  install google-cloud 
!pip install --upgrade google-cloud-speech

Then you would import the Google Cloud client library

from google.cloud import speech
import io

Instantiate a client

client = speech.SpeechClient()

And load yourself a recorded speech file, should be wav format 16kHz sample rate

speech_file = '/home/felix/tmp/google_speech_api_test.wav'

if you run into problems recording one: here is the code that worked for me:

import sounddevice as sd
import numpy as np
from scipy.io.wavfile import write
sr = 16000  # Sample rate
seconds = 3  # Duration of recording
data = sd.rec(int(seconds * fs), samplerate=sr, channels=1)
sd.wait()  # Wait until recording is finished
# Convert `data` to 16 bit integers:
y = (np.iinfo(np.int16).max * (data/np.abs(data).max())).astype(np.int16) 
wavfile.write(speech_file fs, y)

then get yourself an audio object

with io.open(speech_file, "rb") as audio_file:
    content = audio_file.read()
audio = speech.RecognitionAudio(content = content)

Configure the ASR

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="de-DE",
)

Detects speech in the audio file

response = client.recognize(config=config, audio=audio)

and show what you got (with my trial only the first alternative was filled):

for result in response.results:
    for index, alternative in enumerate(result.alternatives):
        print("Transcript {}: {}".format(index, alternative.transcript))