Chapter 8. Audio Parameters

This chapter describes the ML audio parameters and buffers:

Audio Buffer Layout

The digital representation of an audio signal is generated by periodically sampling the amplitude (voltage) of the audio signal. The samples represent periodic snapshots of the signal amplitude. The sampling rate specifies the number of samples per second. The audio buffer pointer points to the source or destination data in an audio buffer for processing a fragment of a media stream. For audio signals, a fragment typically corresponds to between 10 milliseconds and 1 second of audio data. An audio buffer is a collection of sample frames. A sample frame is a set of audio samples that are coincident in time. A sample frame for mono data is a single sample. A sample frame for stereo data consists of a left-right sample pair.

Stereo samples are interleaved; left-channel samples alternate with right-channel samples. Four-channel samples are also interleaved, with each frame usually having two left/right sample pairs, but there can be other arrangements.

Figure 8-1 shows the relationship between the number of channels and the frame size of audio sample data. Figure 8-2 shows the layout of an audio buffer in memory.

Figure 8-1. Different Audio Sample Frames

Different Audio Sample Frames

Figure 8-2. Layout of an Audio Buffer with 4 Channels

Layout of an Audio Buffer with 4 Channels

Audio Parameters Summary

This section discusses the audio parameters.

ML_AUDIO_BUFFER_POINTER

Sets a pointer to the first byte of an in-memory audio buffer. The buffer address must comply with the alignment constraints for buffers on the particular path to which it is being sent. See the mlGetCapabilities(3dm) man page for details of determining alignment requirements.

ML_AUDIO_CHANNELS_INT32

Sets the number of channels of audio data in the buffer. Multichannel audio data is always stored interleaved, with the samples for each consecutive audio channel following one another in sequence. For example, a 4-channel audio stream will have the form:

123412341234...

where: 1 is the sample for the first audio channel, 2 is for the second, and so on.

Common values are:

ML_CHANNELS_MONO
ML_CHANNELS_STEREO
ML_CHANNELS_4
ML_CHANNELS_8

ML_AUDIO_COMPRESSION_INT32

Specifies the compression format if the audio data is in compressed form. The compression format may be an industry standard such as MPEG-1 audio, or it may be no compression at all.

Common values are:

ML_COMPRESSION_A_LAW
ML_COMPRESSION_AC3
ML_COMPRESSION_IMA_ADPCM
ML_COMPRESSION_MPEG1
ML_COMPRESSION_MPEG2
ML_COMPRESSION_MU_LAW
ML_COMPRESSION_UNCOMPRESSED

ML_AUDIO_FORMAT_INT32

Specifies the format in which audio samples are stored in memory. The interpretation of format values is as follows:

ML_FORMAT_TypeBits

  • Type is U for unsigned integer samples, S for signed (2's compliment) integer samples, R for real (floating point) samples

  • Bits is the number of significant bits per sample

For sample formats in which the number of significant bits is less than the number of bits in which the sample is stored, the format of the values is:

ML_FORMAT_TypeBitsinSizeAlignment

  • Size is the total size used for the sample in memory, in bits.

  • Alignment is either R or L depending on whether the significant bits are right- or left-shifted within the sample. For example, following are three of the most common audio buffer formats:

    ML_AUDIO_FORMAT_U8
     

    7 char 0
    +------+
    iiiiiiii

    ML_AUDIO_FORMAT_S16
     

    15  short int  0
    +--------------+
    iiiiiiiiiiiiiiii

    ML_AUDIO_FORMAT_S24in32R
     

    31            int              0
    +------------------------------+
    ssssssssiiiiiiiiiiiiiiiiiiiiiiii

where:

  • s indicates sign-extension

  • i indicates the actual component information

The bit locations refer to the locations when the 8-, 16-, or 32-bit sample has been loaded into a register as an integer quantity. If the audio data compression parameter ML_AUDIO_COMPRESSION_INT32 indicates that the audio data is in compressed form, the ML_AUDIO_FORMAT_INT32 indicates the data type of the samples after decoding. Common formats are:

ML_FORMAT_U8
ML_FORMAT_S16
ML_FORMAT_S24in32R
ML_FORMAT_R32

Default is hardware-specific.

ML_AUDIO_FRAME_SIZE_INT32

Sets the size of an audio sample frame in bytes. This is a read-only parameter and is computed in the device using the current path control settings.

ML_AUDIO_GAINS_REAL64_ARRAY

Sets the gain factor in decibels (dB) on the given path. There will be a value for each audio channel. Negative values represent attenuation. Zero represents no change of the signal. Positive values amplify the signal. A gain of negative infinity indicates infinite attenuation (mute).

ML_AUDIO_PRECISION_INT32

Queries the maximum width in bits for an audio sample at the input or output jack. For example, a value of 16 indicates a 16-bit audio signal. ML_AUDIO_PRECISION_INT32 specifies the precision at the audio I/O jack, whereas ML_AUDIO_FORMAT_INT32 specifies the packing of the audio samples in the audio buffer. If ML_AUDIO_FORMAT_INT32 is different than ML_AUDIO_PRECISION_INT32, the system will convert between the two formats. Such a conversion might include padding and/or truncation.

ML_AUDIO_SAMPLE_RATE_REAL64

Sets the sample rate of the audio data in Hz. The sample rate is the frequency at which samples are taken from the analog signal. Sample rates are measured in hertz (Hz). A sample rate of 1 Hz is equal to one sample per second. For example, when a mono analog audio signal is digitized at a 44.1-kilohertz (kHz) sample rate, 44,100 digital samples are generated for every second of the signal. Values are dependent on the hardware, but are usually between 8,000.0 and 96,000.0. Default is hardware-specific. Common sample rates are:

8,000.0
16,000.0
32,000.0
44,100.0
48,000.0
96,000.0

The Nyquist theorem defines the minimum sampling frequency required to accurately represent the information of an analog signal with a given bandwidth. According to Nyquist, digital audio information is sampled at a frequency that is at least double the highest interesting analog audio frequency. The sample rate used for music-quality audio, such as the digital data stored on audio CDs, is 44.1 kHz. A 44.1-kHz digital signal can theoretically represent audio frequencies from 0 kHz to 22.05 kHz, which adequately represents sounds within the range of normal human hearing. Higher sample rates result in higher-quality digital signals; however, the higher the sample rate, the greater the signal storage requirement.

Uncompressed Audio Buffer Size Computation

The following equation shows how to calculate the number of bytes for an uncompressed audio buffer given the sample frame size, sampling rate, and the time period representing the audio buffer:

N = F . R . T

where:

N

Audio buffer size in bytes

F

The number of bytes per audio sample frame ( ML_AUDIO_FRAMESIZE_INT32)

R

The sample rate in Hz (ML_AUDIO_SAMPLE_RATE_REAL64 )

T

The time period the audio buffer represents in seconds

Example 8-1. Buffer Size Computation

If:

  • F is 4 bytes (if packing is S16 and there are two channels)

  • R (sample rate) is 44,100 Hz

  • T = 40 ms = 0.04 s.

then the resulting buffer size (N) is 7056 bytes.