auditok.util

Class summary

DataSource Base class for objects passed to auditok.core.StreamTokenizer.tokenize().
StringDataSource(data) A class that represent a DataSource as a string buffer.
ADSFactory Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize().
ADSFactory.AudioDataSource(audio_source, …) Base class for AudioDataSource objects.
ADSFactory.ADSDecorator(ads) Base decorator class for AudioDataSource objects.
ADSFactory.OverlapADS(ads, hop_size) A class for AudioDataSource objects that can read and return overlapping audio frames
ADSFactory.LimiterADS(ads, max_time) A class for AudioDataSource objects that can read a fixed amount of data.
ADSFactory.RecorderADS(ads) A class for AudioDataSource objects that can record all audio data they read, with a rewind facility.
DataValidator Base class for a validator object used by core.StreamTokenizer to check if read data is valid.
AudioEnergyValidator(sample_width[, …]) The most basic auditok audio frame validator.
class auditok.util.DataSource[source]

Base class for objects passed to auditok.core.StreamTokenizer.tokenize(). Subclasses should implement a DataSource.read() method.

read()[source]

Read a piece of data read from this source. If no more data is available, return None.

class auditok.util.DataValidator[source]

Base class for a validator object used by core.StreamTokenizer to check if read data is valid. Subclasses should implement is_valid() method.

is_valid(data)[source]

Check whether data is valid

class auditok.util.StringDataSource(data)[source]

A class that represent a DataSource as a string buffer. Each call to DataSource.read() returns on character and moves one step forward. If the end of the buffer is reached, read() returns None.

Parameters:
data :

a basestring object.

read()[source]

Read one character from buffer.

Returns:Current character or None if end of buffer is reached
set_data(data)[source]

Set a new data buffer.

Parameters:
data : a basestring object

New data buffer.

class auditok.util.ADSFactory[source]

Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize().

Whether you read audio data from a file, the microphone or a memory buffer, this factory instantiates and returns the right ADSFactory.AudioDataSource object.

There are many other features you want your ADSFactory.AudioDataSource object to have, such as: memorize all read audio data so that you can rewind and reuse it (especially useful when reading data from the microphone), read a fixed amount of data (also useful when reading from the microphone), read overlapping audio frames (often needed when dosing a spectral analysis of data).

ADSFactory.ads() automatically creates and return object with the desired behavior according to the supplied keyword arguments.

class ADSDecorator(ads)[source]

Base decorator class for AudioDataSource objects.

class AudioDataSource(audio_source, block_size)[source]

Base class for AudioDataSource objects. It inherits from DataSource and encapsulates an AudioSource object.

read()[source]

Read a piece of data read from this source. If no more data is available, return None.

class LimiterADS(ads, max_time)[source]

A class for AudioDataSource objects that can read a fixed amount of data. This can be useful when reading data from the microphone or from large audio files.

read()[source]

Read a piece of data read from this source. If no more data is available, return None.

class OverlapADS(ads, hop_size)[source]

A class for AudioDataSource objects that can read and return overlapping audio frames

read()[source]

Read a piece of data read from this source. If no more data is available, return None.

class RecorderADS(ads)[source]

A class for AudioDataSource objects that can record all audio data they read, with a rewind facility.

read()[source]

Read a piece of data read from this source. If no more data is available, return None.

static ads(**kwargs)[source]

Create an return an ADSFactory.AudioDataSource. The type and behavior of the object is the result of the supplied parameters.

Parameters:
No parameters :
read audio data from the available built-in microphone with the default parameters. The returned ADSFactory.AudioDataSource encapsulate an io.PyAudioSource object and hence it accepts the next four parameters are passed to use instead of their default values.
sampling_rate, sr : (int)
number of samples per second. Default = 16000.
sample_width, sw : (int)
number of bytes per sample (must be in (1, 2, 4)). Default = 2
channels, ch : (int)
number of audio channels. Default = 1 (only this value is currently accepted)
frames_per_buffer, fpb : (int)
number of samples of PyAudio buffer. Default = 1024.
audio_source, asrc : an AudioSource object
read data from this audio source
filename, fn : (string)
build an io.AudioSource object using this file (currently only wave format is supported)
data_buffer, db : (string)
build an io.BufferAudioSource using data in data_buffer. If this keyword is used, sampling_rate, sample_width and channels are passed to io.BufferAudioSource constructor and used instead of default values.
max_time, mt : (float)
maximum time (in seconds) to read. Default behavior: read until there is no more data available.
record, rec : (bool)
save all read data in cache. Provide a navigable object which boasts a rewind method. Default = False.
block_dur, bd : (float)
processing block duration in seconds. This represents the quantity of audio data to return each time the read() method is invoked. If block_dur is 0.025 (i.e. 25 ms) and the sampling rate is 8000 and the sample width is 2 bytes, read() returns a buffer of 0.025 * 8000 * 2 = 400 bytes at most. This parameter will be looked for (and used if available) before block_size. If neither parameter is given, block_dur will be set to 0.01 second (i.e. 10 ms)
hop_dur, hd : (float)
quantity of data to skip from current processing window. if hop_dur is supplied then there will be an overlap of block_dur - hop_dur between two adjacent blocks. This parameter will be looked for (and used if available) before hop_size. If neither parameter is given, hop_dur will be set to block_dur which means that there will be no overlap between two consecutively read blocks.
block_size, bs : (int)
number of samples to read each time the read method is called. Default: a block size that represents a window of 10ms, so for a sampling rate of 16000, the default block_size is 160 samples, for a rate of 44100, block_size = 441 samples, etc.
hop_size, hs : (int)
determines the number of overlapping samples between two adjacent read windows. For a hop_size of value N, the overlap is block_size - N. Default : hop_size = block_size, means that there is no overlap.
Returns:

An AudioDataSource object that has the desired features.

Exampels:
  1. Create an AudioDataSource that reads data from the microphone (requires Pyaudio) with default audio parameters:
from auditok import ADSFactory
ads = ADSFactory.ads()
ads.get_sampling_rate()
16000
ads.get_sample_width()
2
ads.get_channels()
1
  1. Create an AudioDataSource that reads data from the microphone with a sampling rate of 48KHz:
from auditok import ADSFactory
ads = ADSFactory.ads(sr=48000)
ads.get_sampling_rate()
48000
  1. Create an AudioDataSource that reads data from a wave file:
import auditok
from auditok import ADSFactory
ads = ADSFactory.ads(fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.get_sampling_rate()
44100
ads.get_sample_width()
2
ads.get_channels()
1
  1. Define size of read blocks as 20 ms
import auditok
from auditok import ADSFactory
'''
we know samling rate for previous file is 44100 samples/second
so 10 ms are equivalent to 441 samples and 20 ms to 882
'''
block_size = 882
ads = ADSFactory.ads(bs = 882, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.open()
# read one block
data = ads.read()
ads.close()
len(data)
1764
assert len(data) ==  ads.get_sample_width() * block_size
  1. Define block size as a duration (use block_dur or bd):
import auditok
from auditok import ADSFactory
dur = 0.25 # second
ads = ADSFactory.ads(bd = dur, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
'''
we know samling rate for previous file is 44100 samples/second
for a block duration of 250 ms, block size should be 0.25 * 44100 = 11025
'''
ads.get_block_size()
11025
assert ads.get_block_size() ==  int(0.25 * 44100)
ads.open()
# read one block
data = ads.read()
ads.close()
len(data)
22050
assert len(data) ==  ads.get_sample_width() * ads.get_block_size()
  1. Read overlapping blocks (one of hope_size, hs, hop_dur or hd > 0):

For better readability we’d better use auditok.io.BufferAudioSource with a string buffer:

import auditok
from auditok import ADSFactory
'''
we supply a data beffer instead of a file (keyword 'bata_buffer' or 'db')
sr : sampling rate = 16 samples/sec
sw : sample width = 1 byte
ch : channels = 1
'''
buffer = "abcdefghijklmnop" # 16 bytes = 1 second of data
bd = 0.250 # block duration = 250 ms = 4 bytes
hd = 0.125 # hop duration = 125 ms = 2 bytes 
ads = ADSFactory.ads(db = "abcdefghijklmnop", bd = bd, hd = hd, sr = 16, sw = 1, ch = 1)
ads.open()
ads.read()
'abcd'
ads.read()
'cdef'
ads.read()
'efgh'
ads.read()
'ghij'
data = ads.read()
assert data == 'ijkl'
  1. Limit amount of read data (use max_time or mt):
'''
We know audio file is larger than 2.25 seconds
We want to read up to 2.25 seconds of audio data
'''
ads = ADSFactory.ads(mt = 2.25, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.open()
data = []
while True:
    d = ads.read()
    if d is None:
        break
    data.append(d)
    
ads.close()
data = b''.join(data)
assert len(data) == int(ads.get_sampling_rate() * 2.25 * ads.get_sample_width() * ads.get_channels())
class auditok.util.AudioEnergyValidator(sample_width, energy_threshold=45)[source]

The most basic auditok audio frame validator. This validator computes the log energy of an input audio frame and return True if the result is >= a given threshold, False otherwise.

Parameters:
sample_width : (int)
Number of bytes of one audio sample. This is used to convert data from basestring or Bytes to an array of floats.
energy_threshold : (float)
A threshold used to check whether an input data buffer is valid.
is_valid(data)[source]

Check if data is valid. Audio data will be converted into an array (of signed values) of which the log energy is computed. Log energy is computed as follows:

arr = AudioEnergyValidator._convert(signal, sample_width)
energy = float(numpy.dot(arr, arr)) / len(arr)
log_energy = 10. * numpy.log10(energy)
Parameters:
data : either a string or a Bytes buffer
data is converted into a numerical array using the sample_width given in the constructor.
Retruns:

True if log_energy >= energy_threshold, False otherwise.