database | speechsurfer

Sometimes you might want to combine databases that are similar, or alike, but don't handle exactly the same phenomena.

Take for example stress and emotion, you don't have enough data that labels stress, but many emotion databases that label anger and happiness. You might try the approach to use angry samples as stressed and happy or neutral as non-stressed.

Taking the usual emodb as example, and famous Susas as a database sampling stressed voices, you can do this like this:

[DATA]
databases = ['emodb', 'susas']

emodb = ./data/emodb/emodb
# indicate where the target values are
emodb.target_tables = ["emotion"]
# rename emotion to stress
emodb.colnames = {"emotion": "stress"}
# only use angry, neutral and happy samples
emodb.filter = [["stress", ["anger", "neutral", "happiness"]]]
# map them to stress
emodb.mapping = {"anger": "stress",  "neutral": "no stress", "happiness": "no stress"}
# and put everything to the training
emodb.split_strategy = train

susas = data/susas/
# map ternary stress labes to binary
susas.mapping = {'0,1':'no stress', '2':'stress'}
susas.split_strategy = speaker_split

target = stress
labels = ["stress", "no stress"]

So Susas will be split into train and test, but the training will be strenghend by the whole of emodb. This usually makes actually more sense if a third database is available for evaluation, because in-domain machine learning in most of the cases always works better than adding out-of-domain data (like we do here with emodb).

This tutorial explaines how to intitialize an audformat database object from a data collection that's store in a pandas dataframe.
You can also find an official example using emo db here

First you would need the neccessary imports:

import os                       # file operations
import pandas as pd             # work with tables
pd.set_option('display.max_rows', 10)

import audformat.define as define  # some definitions
import audformat.utils as utils    # util functions
import audformat
import pickle

We load a sample pandas dataframe from a speech collection labeled with age and gender.

df = pickle.load(open('../files/sample_df.pkl', 'rb'))
df.head(1)

We can then construct an audformat Databse object from this data like this

# remove the absolute path to the audio samples 
root = '/my/example/path/'
files = [file.replace(root, '') for file in df.index.get_level_values('file')]

# start with a general description
db = audformat.Database(
    name='age-gender-samples',
    source='intern',
    usage=audformat.define.Usage.RESEARCH,
    languages=[audformat.utils.map_language('de')],
    description=(
        'Short snippets  annotated by '
        'speaker and speaker age and gender.'
    ),
)
# add audio format information
db.media['microphone'] = audformat.Media(
    type=audformat.define.MediaType.AUDIO,
    sampling_rate=16000,
    channels=1,
    format='wav',
)
# Describe the age data
db.schemes['age'] = audformat.Scheme(
    dtype=audformat.define.DataType.INTEGER,
    minimum=0,
    maximum=100,
    description='Speaker age in years',
)
# describe the gender data
db.schemes['gender'] = audformat.Scheme(
    labels=[
        audformat.define.Gender.FEMALE,
        audformat.define.Gender.MALE,
    ],
    description='Speaker sex',
)
# describe the speaker id data
db.schemes['speaker'] = audformat.Scheme(
    dtype=audformat.define.DataType.STRING,
    description='Name of the speaker',
)
# initialize a data table with an index which corresponds to the file names
db['files'] = audformat.Table(
    audformat.filewise_index(files),
    media_id='microphone',
)
# now add columns to the table for each data item of interest (age, gender and speaker id)
db['files']['age'] = audformat.Column(scheme_id='age')
db['files']['age'].set(df['age'])
db['files']['gender'] = audformat.Column(scheme_id='gender')
db['files']['gender'].set(df['gender'])
db['files']['speaker'] = audformat.Column(scheme_id='speaker')
db['files']['speaker'].set(df['speaker'])

and finally inspect the result

db

name: age-gender-sample
description: Short snippets annotated by speaker and speaker age and gender.
source: intern
usage: research
languages: [deu]
media:
  microphone: {type: audio, format: wav, channels: 1, sampling_rate: 16000}
schemes:
  age: {description: Speaker age in years, dtype: int, minimum: 0, maximum: 100}
  gender:
    description: Speaker sex
    dtype: str
    labels: [female, male]
  speaker: {description: Name of the speaker, dtype: str}
tables:
  files:
    type: filewise
    media_id: microphone
    columns:
      age: {scheme_id: age}
      gender: {scheme_id: gender}
      speaker: {scheme_id: speaker
      }

and perhaps as a test get the unique valuesof all speakers:

    db.tables['files'].df.speaker.unique()

Important: note that the path to the audiofiles needs to be relative to where the database.yaml file resides and is not allowed to start with "./", so if you do

db.files[0]

this should result in something like

audio/mywav_0001.wav

speechsurfer

Tag Archives: database

Nkululeko: how to align databases

How to create an audformat Database from a pandas Dataframe

blog around speech technology