OpenAI published new speech recognition models that are very easy to use and work in many languages trained on 680,000 hours of multilingual and multitask supervised data collected from the web.
In my case all I had to do to recognize some German test:
# create a virtual environment
virtualenv venv
# activate it
. venv/bin/activate
# install whisper
pip install git+https://github.com/openai/whisper.git
# run the test
whisper test.wav --language German
And my file got recognized correctly, though it took a very long time: for the tiny model speed = x32, i.e. 32 times the time of the speech file duration, was announced