Util

AudioEnergyValidator(energy_threshold, …) A validator based on audio signal energy.
AudioReader(input[, block_dur, hop_dur, …]) Class to read fixed-size chunks of audio data from a source.
Recorder(input[, block_dur, hop_dur, max_read]) Class to read fixed-size chunks of audio data from a source and keeps data in a cache.
make_duration_formatter(fmt) Make and return a function used to format durations in seconds.
make_channel_selector(sample_width, channels) Create and return a callable used for audio channel selection.
auditok.util.make_duration_formatter(fmt)[source]

Make and return a function used to format durations in seconds. Accepted format directives are:

  • %S : absolute number of seconds with 3 decimals. This direction should be used alone.
  • %i : milliseconds
  • %s : seconds
  • %m : minutes
  • %h : hours

These last 4 directives should all be specified. They can be placed anywhere in the input string.

Parameters:fmt (str) – duration format.
Returns:formatter – a function that takes a duration in seconds (float) and returns a string that corresponds to that duration.
Return type:callable
Raises:TimeFormatError – if the format contains an unknown directive.

Examples

Using %S:

formatter = make_duration_formatter("%S")
formatter(123.589)
'123.589'
formatter(123)
'123.000'

Using the other directives:

formatter = make_duration_formatter("%h:%m:%s.%i")
formatter(3600+120+3.25)
'01:02:03.250'

formatter = make_duration_formatter("%h hrs, %m min, %s sec and %i ms")
formatter(3600+120+3.25)
'01 hrs, 02 min, 03 sec and 250 ms'

# omitting one of the 4 directives might result in a wrong duration
formatter = make_duration_formatter("%m min, %s sec and %i ms")
formatter(3600+120+3.25)
'02 min, 03 sec and 250 ms'
auditok.util.make_channel_selector(sample_width, channels, selected=None)[source]

Create and return a callable used for audio channel selection. The returned selector can be used as selector(audio_data) and returns data that contains selected channel only.

Importantly, if selected is None or equals “any”, selector(audio_data) will separate and return a list of available channels: [data_channe_1, data_channe_2, …].

Note also that returned selector expects bytes format for input data but does notnecessarily return a bytes object. In fact, in order to extract the desired channel (or compute the average channel if selected = “avg”), it first converts input data into a array.array (or numpy.ndarray) object. After channel of interst is selected/computed, it is returned as such, without any reconversion to bytes. This behavior is wanted for efficiency purposes because returned objects can be directly used as buffers of bytes. In any case, returned objects can be converted back to bytes using bytes(obj).

Exception to this is the special case where channels = 1 in which input data is returned without any processing.

Parameters:
  • sample_width (int) – number of bytes used to encode one audio sample, should be 1, 2 or 4.
  • channels (int) – number of channels of raw audio data that the returned selector should expect.
  • selected (int or str, default: None) – audio channel to select and return when calling selector(raw_data). It should be an int >= -channels and < channels. If one of “mix”, “avg” or “average” is passed then selector will return the average channel of audio data. If None or “any”, return a list of all available channels at each call.
Returns:

selector – a callable that can be used as selector(audio_data) and returns data that contains channel of interst.

Return type:

callable

Raises:

ValueError – if sample_width is not one of 1, 2 or 4, or if selected has an unexpected value.

class auditok.util.DataSource[source]

Base class for objects passed to StreamTokenizer.tokenize(). Subclasses should implement a DataSource.read() method.

read()[source]

Read a block (i.e., window) of data read from this source. If no more data is available, return None.

class auditok.util.DataValidator[source]

Base class for a validator object used by core.StreamTokenizer to check if read data is valid. Subclasses should implement is_valid() method.

is_valid(data)[source]

Check whether data is valid

class auditok.util.StringDataSource(data)[source]

Class that represent a DataSource as a string buffer. Each call to DataSource.read() returns on character and moves one step forward. If the end of the buffer is reached, read() returns None.

Parameters:data (str) – a string object used as data.
read()[source]

Read one character from buffer.

Returns:char – current character or None if end of buffer is reached.
Return type:str
set_data(data)[source]

Set a new data buffer.

Parameters:data (str) – new data buffer.
class auditok.util.ADSFactory[source]

Deprecated since version 2.0.0: ADSFactory will be removed in auditok 2.0.1, use instances of AudioReader instead.

Factory class that makes it easy to create an AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize().

Whether you read audio data from a file, the microphone or a memory buffer, this factory instantiates and returns the right AudioDataSource object.

There are many other features you want a AudioDataSource object to have, such as: memorize all read audio data so that you can rewind and reuse it (especially useful when reading data from the microphone), read a fixed amount of data (also useful when reading from the microphone), read overlapping audio frames (often needed when dosing a spectral analysis of data).

ADSFactory.ads() automatically creates and return object with the desired behavior according to the supplied keyword arguments.

static ads(**kwargs)[source]

Create an return an AudioDataSource. The type and behavior of the object is the result of the supplied parameters. Called without any parameters, the class will read audio data from the available built-in microphone with the default parameters.

Parameters:
  • sr (sampling_rate,) – number of audio samples per second of input audio stream.
  • sw (sample_width,) – number of bytes per sample, must be one of 1, 2 or 4
  • ch (channels,) – number of audio channels, only a value of 1 is currently accepted.
  • fpb (frames_per_buffer,) – number of samples of PyAudio buffer.
  • asrc (audio_source,) – AudioSource to read data from
  • fn (filename,) – create an AudioSource object using this file
  • db (data_buffer,) – build an io.BufferAudioSource using data in data_buffer. If this keyword is used, sampling_rate, sample_width and channels are passed to io.BufferAudioSource constructor and used instead of default values.
  • mt (max_time,) – maximum time (in seconds) to read. Default behavior: read until there is no more data available.
  • rec (record,) – save all read data in cache. Provide a navigable object which has a rewind method.
  • bd (block_dur,) – processing block duration in seconds. This represents the quantity of audio data to return each time the read() method is invoked. If block_dur is 0.025 (i.e. 25 ms) and the sampling rate is 8000 and the sample width is 2 bytes, read() returns a buffer of 0.025 * 8000 * 2 = 400 bytes at most. This parameter will be looked for (and used if available) before block_size. If neither parameter is given, block_dur will be set to 0.01 second (i.e. 10 ms)
  • hd (hop_dur,) – quantity of data to skip from current processing window. if hop_dur is supplied then there will be an overlap of block_dur - hop_dur between two adjacent blocks. This parameter will be looked for (and used if available) before hop_size. If neither parameter is given, hop_dur will be set to block_dur which means that there will be no overlap between two consecutively read blocks.
  • bs (block_size,) – number of samples to read each time the read method is called. Default: a block size that represents a window of 10ms, so for a sampling rate of 16000, the default block_size is 160 samples, for a rate of 44100, block_size = 441 samples, etc.
  • hs (hop_size,) – determines the number of overlapping samples between two adjacent read windows. For a hop_size of value N, the overlap is block_size - N. Default : hop_size = block_size, means that there is no overlap.
Returns:

audio_data_source – an AudioDataSource object build with input parameters.

Return type:

AudioDataSource

auditok.util.AudioDataSource

alias of auditok.util.AudioReader

class auditok.util.AudioReader(input, block_dur=0.01, hop_dur=None, record=False, max_read=None, **kwargs)[source]

Class to read fixed-size chunks of audio data from a source. A source can be a file on disk, standard input (with input = “-“) or microphone. This is normally used by tokenization algorithms that expect source objects with a read function that returns a windows of data of the same size at each call expect when remaining data does not make up a full window.

Objects of this class can be set up to return audio windows with a given overlap and to record the whole stream for later access (useful when reading data from the microphone). They can also have a limit for the maximum amount of data to read.

Parameters:
  • input (str, bytes, AudioSource, AudioReader, AudioRegion or None) – input audio data. If the type of the passed argument is str, it should be a path to an existing audio file. “-” is interpreted as standardinput. If the type is bytes, input is considered as a buffer of raw audio data. If None, read audio from microphone. Every object that is not an AudioReader will be transformed, when possible, into an AudioSource before processing. If it is an str that refers to a raw audio file, bytes or None, audio parameters should be provided using kwargs (i.e., samplig_rate, sample_width and channels or their alias).
  • block_dur (float, default: 0.01) – length in seconds of audio windows to return at each read call.
  • hop_dur (float, default: None) – length in seconds of data amount to skip from previous window. If defined, it is used to compute the temporal overlap between previous and current window (nameply overlap = block_dur - hop_dur). Default, None, means that consecutive windows do not overlap.
  • record (bool, default: False) – whether to record read audio data for later access. If True, audio data can be retrieved by first calling rewind(), then using the data property. Note that once rewind() is called, no new data will be read from source (subsequent read() call will read data from cache) and that there’s no need to call rewind() again to access data property.
  • max_read (float, default: None) – maximum amount of audio data to read in seconds. Default is None meaning that data will be read until end of stream is reached or, when reading from microphone a Ctrl-C is sent.
  • input is None, of type bytes or a raw audio files some of the (When) –
  • kwargs are mandatory. (follwing) –
Other Parameters:
 
  • audio_format, fmt (str) – type of audio data (e.g., wav, ogg, flac, raw, etc.). This will only be used if input is a string path to an audio file. If not given, audio type will be guessed from file name extension or from file header.
  • sampling_rate, sr (int) – sampling rate of audio data. Required if input is a raw audio file, is a bytes object or None (i.e., read from microphone).
  • sample_width, sw (int) – number of bytes used to encode one audio sample, typically 1, 2 or 4. Required for raw data, see sampling_rate.
  • channels, ch (int) – number of channels of audio data. Required for raw data, see sampling_rate.
  • use_channel, uc ({None, “any”, “mix”, “avg”, “average”} or int) – which channel to use for split if input has multiple audio channels. Regardless of which channel is used for splitting, returned audio events contain data from all the channels of input. The following values are accepted:
    • None (alias “any”): accept audio activity from any channel, even if other channels are silent. This is the default behavior.
    • “mix” (alias “avg” or “average”): mix down all channels (i.e., compute average channel) and split the resulting channel.
    • int (>= 0 , < channels): use one channel, specified by its integer id, for split.
  • large_file (bool, default: False) – If True, AND if input is a path to a wav of a raw audio file (and only these two formats) then audio data is lazily loaded to memory (i.e., one analysis window a time). Otherwise the whole file is loaded to memory before split. Set to True if the size of the file is larger than available memory.
read()[source]

Read a block (i.e., window) of data read from this source. If no more data is available, return None.

class auditok.util.Recorder(input, block_dur=0.01, hop_dur=None, max_read=None, **kwargs)[source]

Class to read fixed-size chunks of audio data from a source and keeps data in a cache. Using this class is equivalent to initializing AudioReader with record=True. For more information about the other parameters see AudioReader.

Once the desired amount of data is read, you can call the rewind() method then get the recorded data via the data attribute. You can also re-read cached data one window a time by calling read().

class auditok.util.AudioEnergyValidator(energy_threshold, sample_width, channels, use_channel=None)[source]

A validator based on audio signal energy. For an input window of N audio samples (see AudioEnergyValidator.is_valid()), the energy is computed as:

\[energy = 20 \log(\sqrt({1}/{N}\sum_{i}^{N}{a_i}^2)) % # noqa: W605\]

where a_i is the i-th audio sample.

Parameters:
  • energy_threshold (float) – minimum energy that audio window should have to be valid.
  • sample_width (int) – size in bytes of one audio sample.
  • channels (int) – number of channels of audio data.
  • use_channel ({None, "any", "mix", "avg", "average"} or int) –

    channel to use for energy computation. The following values are accepted:

    • None (alias “any”) : compute energy for each of the channels and return the maximum value.
    • ”mix” (alias “avg” or “average”) : compute the average channel then compute its energy.
    • int (>= 0 , < channels) : compute the energy of the specified channel and ignore the other ones.
Returns:

energy – energy of the audio window.

Return type:

float

is_valid(data)[source]
Parameters:data (bytes-like) – array of raw audio data
Returns:True if the energy of audio data is >= threshold, False otherwise.
Return type:bool