Load audio data
Audio data is loaded using the load() function, which can read from
audio files, capture from the microphone, or accept raw audio data
(as a bytes object).
From a file
If the first argument of load() is a string or a Path, it should
refer to an existing audio file.
import auditok
region = auditok.load("audio.ogg")
If the input file contains raw (headerless) audio data, specifying audio
parameters (sampling_rate, sample_width, and channels) is required.
Additionally, if the file name does not end with ‘raw’, you should explicitly
pass audio_format="raw" to the function.
In the example below, we provide audio parameters using their abbreviated names:
region = auditok.load("audio.dat",
audio_format="raw",
sr=44100, # alias for `sampling_rate`
sw=2, # alias for `sample_width`
ch=1 # alias for `channels`
)
Alternatively you can user AudioRegion to load audio data:
from auditok import AudioRegion
region = AudioRegion.load("audio.dat",
audio_format="raw",
sr=44100, # alias for `sampling_rate`
sw=2, # alias for `sample_width`
ch=1 # alias for `channels`
)
From a bytes object
If the first argument is of type bytes, it is interpreted as raw audio data:
sr = 16000
sw = 2
ch = 1
data = b"\0" * sr * sw * ch
region = auditok.load(data, sr=sr, sw=sw, ch=ch)
print(region)
# alternatively you can use
region = auditok.AudioRegion(data, sr, sw, ch)
output:
AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
From the microphone
If the first argument is None, load() will attempt to read data from the
microphone. In this case, audio parameters, along with the max_read parameter,
are required.
sr = 16000
sw = 2
ch = 1
five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
print(five_sec_audio)
output:
AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
Skip part of audio data
If the skip parameter is greater than 0, load() will skip that specified
amount of leading audio data, measured in seconds:
import auditok
region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
This argument must be 0 when reading data from the microphone.
Limit the amount of read audio
If the max_read parameter is > 0, load() will read at most that amount
in seconds of audio data:
import auditok
region = auditok.load("audio.ogg", max_read=5)
assert region.duration <= 5
This argument is required when reading data from the microphone.
Basic split example
In the following example, we’ll use the split() function to tokenize an
audio file.We’ll specify that valid audio events must be at least 0.2 seconds
long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous
silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance,
will be returned as two 4-second events plus a final 1.5-second event. Additionally,
a valid event may contain multiple silences, as long as none exceed 0.3 seconds.
split() returns a generator of AudioRegion objects. Each
AudioRegion can be played, saved, repeated (multiplied by an integer),
and concatenated with another region (see examples below). Note that
AudioRegion objects returned by split() include start and stop
attributes, which mark the beginning and end of the audio event relative to the
input audio stream.
import auditok
# `split` returns a generator of AudioRegion objects
audio_events = auditok.split(
"audio.wav",
min_dur=0.2, # Minimum duration of a valid audio event in seconds
max_dur=4, # Maximum duration of an event
max_silence=0.3, # Maximum tolerated silence duration within an event
energy_threshold=55 # Detection threshold
)
for i, r in enumerate(audio_events):
# AudioRegions returned by `split` have defined 'start' and 'end' attributes
print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
# Play the audio event
r.play(progress_bar=True)
# Save the event with start and end times in the filename
filename = r.save("event_{start:.3f}-{end:.3f}.wav")
print(f"event saved as: {filename}")
Example output:
Event 0: 0.700s -- 1.400s
event saved as: event_0.700-1.400.wav
Event 1: 3.800s -- 4.500s
event saved as: event_3.800-4.500.wav
Event 2: 8.750s -- 9.950s
event saved as: event_8.750-9.950.wav
Event 3: 11.700s -- 12.400s
event saved as: event_11.700-12.400.wav
Event 4: 15.050s -- 15.850s
event saved as: event_15.050-15.850.wav
Split and plot
Visualize audio signal and detections:
import auditok
region = auditok.load("audio.wav") # returns an AudioRegion object
regions = region.split_and_plot(...) # or just region.splitp()
output figure:
Split an audio stream and re-join (glue) audio events with silence
The following code detects audio events within an audio stream, then insert 1 second of silence between them to create an audio with pauses:
# Create a 1-second silent audio region
# Audio parameters must match the original stream
from auditok import split, make_silence
silence = make_silence(duration=1,
sampling_rate=16000,
sample_width=2,
channels=1)
events = split("audio.wav")
audio_with_pauses = silence.join(events)
Alternatively, use split_and_join_with_silence:
from auditok import split_and_join_with_silence
audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
Read audio data from the microphone and perform real-time event detection
If the first argument of split() is None, audio data is read from the
microphone (requires pyaudio):
import auditok
sr = 16000
sw = 2
ch = 1
eth = 55 # alias for energy_threshold, default value is 50
try:
for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
print(region)
region.play(progress_bar=True) # progress bar requires `tqdm`
except KeyboardInterrupt:
pass
split() will continue reading audio data until you press Ctrl-C. To read
a specific amount of audio data, pass the desired number of seconds using the
max_read argument.
Access recorded data after split
Using a Recorder object you can access to audio data read from a file
of from the mirophone. With the following code press Ctrl-C to stop recording:
import auditok
sr = 16000
sw = 2
ch = 1
eth = 55 # alias for energy_threshold, default value is 50
rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
events = []
try:
for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
print(region)
region.play(progress_bar=True)
events.append(region)
except KeyboardInterrupt:
pass
rec.rewind()
full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
# alternatively you can use
full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
full_audio.play(progress_bar=True)
Recorder also accepts a max_read argument.
Working with AudioRegions
In the following sections, we will review several operations
that can be performed with AudioRegion objects.
Basic region information
import auditok
region = auditok.load("audio.wav")
len(region) # number of audio samples int the regions, one channel considered
region.duration # duration in seconds
region.sampling_rate # alias `sr`
region.sample_width # alias `sw`
region.channels # alias `ch`
When an audio region is returned by the split() function, it includes defined
start and end attributes that refer to the beginning and end of the audio
event relative to the input audio stream.
Concatenate regions
import auditok
region_1 = auditok.load("audio_1.wav")
region_2 = auditok.load("audio_2.wav")
region_3 = region_1 + region_2
This is particularly useful when you want to join regions returned by the
split() function:
import auditok
regions = auditok.load("audio.wav").split()
gapless_region = sum(regions)
Repeat a region
Multiply by a positive integer:
import auditok
region = auditok.load("audio.wav")
region_x3 = region * 3
Split one region into N regions of equal size
Divide by a positive integer (this is unrelated to silence-based tokenization!):
import auditok
region = auditok.load("audio.wav")
regions = regions / 5
assert sum(regions) == region
Note that if an exact split is not possible, the last region may be shorter than the preceding N-1 regions.
Slice a region by samples, seconds or milliseconds
Slicing an AudioRegion can be useful in various situations.
For example, you can remove a fixed-length portion of audio data from
the beginning or end of a region, or crop a region by an arbitrary amount
as a data augmentation strategy.
The most accurate way to slice an AudioRegion is by using indices
that directly refer to raw audio samples. In the following example, assuming
the audio data has a sampling rate of 16000, you can extract a 5-second
segment from the main region, starting at the 20th second, as follows:
import auditok
region = auditok.load("audio.wav")
start = 20 * 16000
stop = 25 * 16000
five_second_region = region[start:stop]
This allows you to start and stop at any audio sample within the region. Similar
to a list, you can omit either start or stop, or both. Negative
indices are also supported:
import auditok
region = auditok.load("audio.wav")
start = -3 * region.sr # `sr` is an alias of `sampling_rate`
three_last_seconds = region[start:]
While slicing by raw samples offers flexibility, using temporal indices is
often more intuitive. You can achieve this by accessing the millis or seconds
views of an AudioRegion (or using their shortcut aliases ms, sec, or s).
With the millis view:
import auditok
region = auditok.load("audio.wav")
five_second_region = region.millis[5000:10000]
# or
five_second_region = region.ms[5000:10000]
or with the seconds view:
import auditok
region = auditok.load("audio.wav")
five_second_region = region.seconds[5:10]
# or
five_second_region = region.sec[5:10]
# or
five_second_region = region.s[5:10]
seconds indices can also be floats:
import auditok
region = auditok.load("audio.wav")
five_second_region = region.seconds[2.5:7.5]
Export an AudioRegion as a numpy array
from auditok import load, AudioRegion
audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
x = audio.numpy()
assert x.shape[0] == audio.channels
assert x.shape[1] == len(audio)