What you need to do this at first is to get yourselg a Google API key,
- you need to register with Google speech APIs, i.e. get a Google cloud platform account
- you need to share payment details, but (at the time of writing, i think) the first 60 minutes of processed speech per month are free.
I export my API key each time I want to use this like so:
export GOOGLE_APPLICATION_CREDENTIALS="/home/felix/data/research/Google/api_key.json"
This tutorial assumes you did that and you started a Jupyter notebook . If you don't know what this is, here's a tutorial on how to set one up (first part)
Bevor you can import the Google speech api make shure it's installed:
!pip install google-cloud
!pip install --upgrade google-cloud-speech
Then you would import the Google Cloud client library
from google.cloud import speech
import io
Instantiate a client
client = speech.SpeechClient()
And load yourself a recorded speech file, should be wav format 16kHz sample rate
speech_file = '/home/felix/tmp/google_speech_api_test.wav'
if you run into problems recording one: here is the code that worked for me:
import sounddevice as sd
import numpy as np
from scipy.io.wavfile import write
sr = 16000 # Sample rate
seconds = 3 # Duration of recording
data = sd.rec(int(seconds * fs), samplerate=sr, channels=1)
sd.wait() # Wait until recording is finished
# Convert `data` to 16 bit integers:
y = (np.iinfo(np.int16).max * (data/np.abs(data).max())).astype(np.int16)
wavfile.write(speech_file fs, y)
then get yourself an audio object
with io.open(speech_file, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content = content)
Configure the ASR
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="de-DE",
)
Detects speech in the audio file
response = client.recognize(config=config, audio=audio)
and show what you got (with my trial only the first alternative was filled):
for result in response.results:
for index, alternative in enumerate(result.alternatives):
print("Transcript {}: {}".format(index, alternative.transcript))