auditok.util¶

Class summary¶

`DataSource`	Base class for objects passed to `auditok.core.StreamTokenizer.tokenize()`.
`StringDataSource`(data)	A class that represent a `DataSource` as a string buffer.
`ADSFactory`	Factory class that makes it easy to create an `ADSFactory.AudioDataSource` object that implements `DataSource` and can therefore be passed to `auditok.core.StreamTokenizer.tokenize()`.
`ADSFactory.AudioDataSource`(audio_source, …)	Base class for AudioDataSource objects.
`ADSFactory.ADSDecorator`(ads)	Base decorator class for AudioDataSource objects.
`ADSFactory.OverlapADS`(ads, hop_size)	A class for AudioDataSource objects that can read and return overlapping audio frames
`ADSFactory.LimiterADS`(ads, max_time)	A class for AudioDataSource objects that can read a fixed amount of data.
`ADSFactory.RecorderADS`(ads)	A class for AudioDataSource objects that can record all audio data they read, with a rewind facility.
`DataValidator`	Base class for a validator object used by `core.StreamTokenizer` to check if read data is valid.
`AudioEnergyValidator`(sample_width[, …])	The most basic auditok audio frame validator.

class auditok.util.DataSource[source]¶

Base class for objects passed to auditok.core.StreamTokenizer.tokenize(). Subclasses should implement a DataSource.read() method.

read()[source]¶: Read a piece of data read from this source. If no more data is available, return None.

class auditok.util.DataValidator[source]¶

Base class for a validator object used by core.StreamTokenizer to check if read data is valid. Subclasses should implement is_valid() method.

is_valid(data)[source]¶: Check whether data is valid

class auditok.util.StringDataSource(data)[source]¶

A class that represent a DataSource as a string buffer. Each call to DataSource.read() returns on character and moves one step forward. If the end of the buffer is reached, read() returns None.

Parameters:	data : a basestring object.

read()[source]¶

Read one character from buffer.

Returns:	Current character or None if end of buffer is reached

set_data(data)[source]¶

Set a new data buffer.

Parameters:	data : a basestring object New data buffer.

class auditok.util.ADSFactory[source]¶

Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize().

Whether you read audio data from a file, the microphone or a memory buffer, this factory instantiates and returns the right ADSFactory.AudioDataSource object.

There are many other features you want your ADSFactory.AudioDataSource object to have, such as: memorize all read audio data so that you can rewind and reuse it (especially useful when reading data from the microphone), read a fixed amount of data (also useful when reading from the microphone), read overlapping audio frames (often needed when dosing a spectral analysis of data).

ADSFactory.ads() automatically creates and return object with the desired behavior according to the supplied keyword arguments.

class ADSDecorator(ads)[source]¶: Base decorator class for AudioDataSource objects.

class AudioDataSource(audio_source, block_size)[source]¶

Base class for AudioDataSource objects. It inherits from DataSource and encapsulates an AudioSource object.

read()[source]¶: Read a piece of data read from this source. If no more data is available, return None.

class LimiterADS(ads, max_time)[source]¶

A class for AudioDataSource objects that can read a fixed amount of data. This can be useful when reading data from the microphone or from large audio files.

read()[source]¶: Read a piece of data read from this source. If no more data is available, return None.

class OverlapADS(ads, hop_size)[source]¶

A class for AudioDataSource objects that can read and return overlapping audio frames

read()[source]¶: Read a piece of data read from this source. If no more data is available, return None.

class RecorderADS(ads)[source]¶

A class for AudioDataSource objects that can record all audio data they read, with a rewind facility.

read()[source]¶: Read a piece of data read from this source. If no more data is available, return None.

static ads(**kwargs)[source]¶

Create an return an ADSFactory.AudioDataSource. The type and behavior of the object is the result of the supplied parameters.

Parameters:

No parameters :: read audio data from the available built-in microphone with the default parameters. The returned ADSFactory.AudioDataSource encapsulate an io.PyAudioSource object and hence it accepts the next four parameters are passed to use instead of their default values.
sampling_rate, sr : (int): number of samples per second. Default = 16000.
sample_width, sw : (int): number of bytes per sample (must be in (1, 2, 4)). Default = 2
channels, ch : (int): number of audio channels. Default = 1 (only this value is currently accepted)
frames_per_buffer, fpb : (int): number of samples of PyAudio buffer. Default = 1024.
audio_source, asrc : an AudioSource object: read data from this audio source
filename, fn : (string): build an io.AudioSource object using this file (currently only wave format is supported)
data_buffer, db : (string): build an io.BufferAudioSource using data in data_buffer. If this keyword is used, sampling_rate, sample_width and channels are passed to io.BufferAudioSource constructor and used instead of default values.
max_time, mt : (float): maximum time (in seconds) to read. Default behavior: read until there is no more data available.
record, rec : (bool): save all read data in cache. Provide a navigable object which boasts a rewind method. Default = False.
block_dur, bd : (float): processing block duration in seconds. This represents the quantity of audio data to return each time the read() method is invoked. If block_dur is 0.025 (i.e. 25 ms) and the sampling rate is 8000 and the sample width is 2 bytes, read() returns a buffer of 0.025 * 8000 * 2 = 400 bytes at most. This parameter will be looked for (and used if available) before block_size. If neither parameter is given, block_dur will be set to 0.01 second (i.e. 10 ms)
hop_dur, hd : (float): quantity of data to skip from current processing window. if hop_dur is supplied then there will be an overlap of block_dur - hop_dur between two adjacent blocks. This parameter will be looked for (and used if available) before hop_size. If neither parameter is given, hop_dur will be set to block_dur which means that there will be no overlap between two consecutively read blocks.
block_size, bs : (int): number of samples to read each time the read method is called. Default: a block size that represents a window of 10ms, so for a sampling rate of 16000, the default block_size is 160 samples, for a rate of 44100, block_size = 441 samples, etc.
hop_size, hs : (int): determines the number of overlapping samples between two adjacent read windows. For a hop_size of value N, the overlap is block_size - N. Default : hop_size = block_size, means that there is no overlap.

Returns:

An AudioDataSource object that has the desired features.

Exampels:

Create an AudioDataSource that reads data from the microphone (requires Pyaudio) with default audio parameters:

from auditok import ADSFactory
ads = ADSFactory.ads()
ads.get_sampling_rate()
16000
ads.get_sample_width()
2
ads.get_channels()
1

Create an AudioDataSource that reads data from the microphone with a sampling rate of 48KHz:

from auditok import ADSFactory
ads = ADSFactory.ads(sr=48000)
ads.get_sampling_rate()
48000

Create an AudioDataSource that reads data from a wave file:

import auditok
from auditok import ADSFactory
ads = ADSFactory.ads(fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.get_sampling_rate()
44100
ads.get_sample_width()
2
ads.get_channels()
1

Define size of read blocks as 20 ms

import auditok
from auditok import ADSFactory
'''
we know samling rate for previous file is 44100 samples/second
so 10 ms are equivalent to 441 samples and 20 ms to 882
'''
block_size = 882
ads = ADSFactory.ads(bs = 882, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.open()
# read one block
data = ads.read()
ads.close()
len(data)
1764
assert len(data) ==  ads.get_sample_width() * block_size

Define block size as a duration (use block_dur or bd):

import auditok
from auditok import ADSFactory
dur = 0.25 # second
ads = ADSFactory.ads(bd = dur, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
'''
we know samling rate for previous file is 44100 samples/second
for a block duration of 250 ms, block size should be 0.25 * 44100 = 11025
'''
ads.get_block_size()
11025
assert ads.get_block_size() ==  int(0.25 * 44100)
ads.open()
# read one block
data = ads.read()
ads.close()
len(data)
22050
assert len(data) ==  ads.get_sample_width() * ads.get_block_size()

Read overlapping blocks (one of hope_size, hs, hop_dur or hd > 0):

For better readability we’d better use auditok.io.BufferAudioSource with a string buffer:

import auditok
from auditok import ADSFactory
'''
we supply a data beffer instead of a file (keyword 'bata_buffer' or 'db')
sr : sampling rate = 16 samples/sec
sw : sample width = 1 byte
ch : channels = 1
'''
buffer = "abcdefghijklmnop" # 16 bytes = 1 second of data
bd = 0.250 # block duration = 250 ms = 4 bytes
hd = 0.125 # hop duration = 125 ms = 2 bytes 
ads = ADSFactory.ads(db = "abcdefghijklmnop", bd = bd, hd = hd, sr = 16, sw = 1, ch = 1)
ads.open()
ads.read()
'abcd'
ads.read()
'cdef'
ads.read()
'efgh'
ads.read()
'ghij'
data = ads.read()
assert data == 'ijkl'

Limit amount of read data (use max_time or mt):

'''
We know audio file is larger than 2.25 seconds
We want to read up to 2.25 seconds of audio data
'''
ads = ADSFactory.ads(mt = 2.25, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.open()
data = []
while True:
    d = ads.read()
    if d is None:
        break
    data.append(d)
    
ads.close()
data = b''.join(data)
assert len(data) == int(ads.get_sampling_rate() * 2.25 * ads.get_sample_width() * ads.get_channels())

class auditok.util.AudioEnergyValidator(sample_width, energy_threshold=45)[source]¶

The most basic auditok audio frame validator. This validator computes the log energy of an input audio frame and return True if the result is >= a given threshold, False otherwise.

Parameters:

sample_width : (int): Number of bytes of one audio sample. This is used to convert data from basestring or Bytes to an array of floats.
energy_threshold : (float): A threshold used to check whether an input data buffer is valid.

is_valid(data)[source]¶

Check if data is valid. Audio data will be converted into an array (of signed values) of which the log energy is computed. Log energy is computed as follows:

arr = AudioEnergyValidator._convert(signal, sample_width)
energy = float(numpy.dot(arr, arr)) / len(arr)
log_energy = 10. * numpy.log10(energy)

Parameters:

data : either a string or a Bytes buffer: data is converted into a numerical array using the sample_width given in the constructor.

Retruns:

True if log_energy >= energy_threshold, False otherwise.