Load audio data

Audio data is loaded with the load() function which can read from audio files, the microphone or use raw audio data.

From a file

If the first argument of load() is a string, it should be a path to an audio file.

import auditok
region = auditok.load("audio.ogg")

If input file contains raw (headerless) audio data, passing audio_format=”raw” and other audio parameters (sampling_rate, sample_width and channels) is mandatory. In the following example we pass audio parameters with their short names:

region = auditok.load("audio.dat",
                      audio_format="raw",
                      sr=44100, # alias for `sampling_rate`
                      sw=2      # alias for `sample_width`
                      ch=1      # alias for `channels`
                      )

From a bytes object

If the type of the first argument bytes, it’s interpreted as raw audio data:

sr = 16000
sw = 2
ch = 1
data = b"\0" * sr * sw * ch
region = auditok.load(data, sr=sr, sw=sw, ch=ch)
print(region)
# alternatively you can use
#region = auditok.AudioRegion(data, sr, sw, ch)

output:

AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)

From the microphone

If the first argument is None, load() will try to read data from the microphone. Audio parameters, as well as the max_read parameter are mandatory:

sr = 16000
sw = 2
ch = 1
five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
print(five_sec_audio)

output:

AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)

Skip part of audio data

If the skip parameter is > 0, load() will skip that amount in seconds of leading audio data:

import auditok
region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds

This argument must be 0 when reading data from the microphone.

Limit the amount of read audio

If the max_read parameter is > 0, load() will read at most that amount in seconds of audio data:

import auditok
region = auditok.load("audio.ogg", max_read=5)
assert region.duration <= 5

This argument is mandatory when reading data from the microphone.

Basic split example

In the following we’ll use the split() function to tokenize an audio file, requiring that valid audio events be at least 0.2 second long, at most 4 seconds long and contain a maximum of 0.3 second of continuous silence. Limiting the size of detected events to 4 seconds means that an event of, say, 9.5 seconds will be returned as two 4-second events plus a third 1.5-second event. Moreover, a valid event might contain many silences as far as none of them exceeds 0.3 second.

split() returns a generator of AudioRegion. An AudioRegion can be played, saved, repeated (i.e., multiplied by an integer) and concatenated with another region (see examples below). Notice that AudioRegion objects returned by split() have a start a stop information stored in their meta data that can be accessed like object.meta.start.

import auditok

# split returns a generator of AudioRegion objects
audio_regions = auditok.split(
    "audio.wav",
    min_dur=0.2,     # minimum duration of a valid audio event in seconds
    max_dur=4,       # maximum duration of an event
    max_silence=0.3, # maximum duration of tolerated continuous silence within an event
    energy_threshold=55 # threshold of detection
)

for i, r in enumerate(audio_regions):

    # Regions returned by `split` have 'start' and 'end' metadata fields
    print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))

    # play detection
    # r.play(progress_bar=True)

    # region's metadata can also be used with the `save` method
    # (no need to explicitly specify region's object and `format` arguments)
    filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
    print("region saved as: {}".format(filename))

output example:

Region 0: 0.700s -- 1.400s
region saved as: region_0.700-1.400.wav
Region 1: 3.800s -- 4.500s
region saved as: region_3.800-4.500.wav
Region 2: 8.750s -- 9.950s
region saved as: region_8.750-9.950.wav
Region 3: 11.700s -- 12.400s
region saved as: region_11.700-12.400.wav
Region 4: 15.050s -- 15.850s
region saved as: region_15.050-15.850.wav

Split and plot

Visualize audio signal and detections:

import auditok
region = auditok.load("audio.wav") # returns an AudioRegion object
regions = region.split_and_plot(...) # or just region.splitp()

output figure:

_images/example_1.png

Read and split data from the microphone

If the first argument of split() is None, audio data is read from the microphone (requires pyaudio):

import auditok

sr = 16000
sw = 2
ch = 1
eth = 55 # alias for energy_threshold, default value is 50

try:
    for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
        print(region)
        region.play(progress_bar=True) # progress bar requires `tqdm`
except KeyboardInterrupt:
     pass

split() will continue reading audio data until you press Ctrl-C. If you want to read a specific amount of audio data, pass the desired number of seconds with the max_read argument.

Access recorded data after split

Using a Recorder object you can get hold of acquired audio data:

import auditok

sr = 16000
sw = 2
ch = 1
eth = 55 # alias for energy_threshold, default value is 50

rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)

try:
    for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
        print(region)
        region.play(progress_bar=True) # progress bar requires `tqdm`
except KeyboardInterrupt:
     pass

rec.rewind()
full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
# alternatively you can use
full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)

Recorder also accepts a max_read argument.

Working with AudioRegions

The following are a couple of interesting operations you can do with AudioRegion objects.

Basic region information

import auditok
region = auditok.load("audio.wav")
len(region) # number of audio samples int the regions, one channel considered
region.duration # duration in seconds
region.sampling_rate # alias `sr`
region.sample_width # alias `sw`
region.channels # alias `ch`

Concatenate regions

import auditok
region_1 = auditok.load("audio_1.wav")
region_2 = auditok.load("audio_2.wav")
region_3 = region_1 + region_2

Particularly useful if you want to join regions returned by split():

import auditok
regions = auditok.load("audio.wav").split()
gapless_region = sum(regions)

Repeat a region

Multiply by a positive integer:

import auditok
region = auditok.load("audio.wav")
region_x3 = region * 3

Split one region into N regions of equal size

Divide by a positive integer (this has nothing to do with silence-based tokenization):

import auditok
region = auditok.load("audio.wav")
regions = regions / 5
assert sum(regions) == region

Note that if no perfect division is possible, the last region might be a bit shorter than the previous N-1 regions.

Slice a region by samples, seconds or milliseconds

Slicing an AudioRegion can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or from the end of a region or crop a region by an arbitrary amount as a data augmentation strategy.

The most accurate way to slice an AudioRegion is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows:

import auditok
region = auditok.load("audio.wav")
start = 20 * 16000
stop = 25 * 16000
five_second_region = region[start:stop]

This allows you to practically start and stop at any audio sample within the region. Just as with a list you can omit one of start and stop, or both. You can also use negative indices:

import auditok
region = auditok.load("audio.wav")
start = -3 * region.sr # `sr` is an alias of `sampling_rate`
three_last_seconds = region[start:]

While slicing by raw samples is flexible, slicing with temporal indices is more intuitive. You can do so by accessing the millis or seconds views of an AudioRegion (or their shortcut alias ms and sec or s).

With the millis view:

import auditok
region = auditok.load("audio.wav")
five_second_region = region.millis[5000:10000]

or with the seconds view:

import auditok
region = auditok.load("audio.wav")
five_second_region = region.seconds[5:10]

seconds indices can also be floats:

import auditok
region = auditok.load("audio.wav")
five_second_region = region.seconds[2.5:7.5]

Get arrays of audio samples

If numpy is not installed, the samples attributes is a list of audio samples arrays (standard array.array objects), one per channels. If numpy is installed, samples is a 2-D numpy.ndarray where the fist dimension is the channel and the second is the the sample.

import auditok
region = auditok.load("audio.wav")
samples = region.samples
assert len(samples) == region.channels

If numpy is installed you can use:

import numpy as np
region = auditok.load("audio.wav")
samples = np.asarray(region)
assert len(samples.shape) == 2