Here is how to text to speech (TTS) synthesize with a German single female speaker Tacotron2 model and esp2 net
You need the python packages
pip install torch espnet_model_zoo phonemizer
Then you can run
import soundfile
from espnet2.bin.tts_inference import Text2Speech
model = 'https://zenodo.org/record/5150957/files/tts_train_tacotron2_raw_hokuspokus_phn_espeak_ng_german_train.loss.ave.zip?download=1'
text2speech = Text2Speech.from_pretrained(model)
speech = text2speech("Wow, das war ja einfach!")["wav"]
soundfile.write("out.wav", speech.numpy(), text2speech.fs, "PCM_16")