Functions and Restrictions

Only the G.711a, G.711u, and Advanced Audio Coding (AAC) protocols are supported, and only mono-channel data can be encoded and decoded using software. These APIs do not support multiple processes, and the same device ID cannot be used in multiple threads.

During audio encoding, G.711a and G.711u streams comply with the frame structure described in the following table. That is, a 4-byte frame header is added before the payload of each frame stream. The header information needs to be read during audio decoding. In G.711a and G.711u streams, the length of the data payload in the frame header must be 40, 80, 120, 160, or 240 (unit: hi_s16). The number of sampling points in each corresponding frame is 80, 160, 240, 320, or 480. In the voice quality enhancement (VQE) framework, the encoding and decoding operations are associated with the AI and AO modules, so the sampling points range supported by each frame is narrowed down.

Parameter Position (Unit: hi_s16)

Parameter Bit

Meaning

0

[15:8]

Flag of the frame type.

01: Voice frame;

Other values: reserved

[7:0]

Reserved.

1

[15:8]

Frame circulation counter: 0–255.

[7:0]

Length of the payload (unit: hi_s16).

2

[15:0]

Payload data.

3

[15:0]

Payload data.

...

[15:0]

Payload data.

n+1

[15:0]

Payload data.

n+2

[15:0]

Payload data.

The following table describes the specifications, advantages, and disadvantages of G.711a, G.711u, and AAC.

Protocol

Sampling Rate

Sampling Points in Each Frame

Bit Rate (kbit/s)

Compression Rate

CPU Usage

Description

G.711a/G.711u

8 kHz

80/160/240/320/480

64

2

1 MHz

  • Advantages: best voice quality; low CPU consumption; wide application; and free of charge.
  • Disadvantage: low compression efficiency.

AAC

Only MPEG-4 audio streams without cyclic redundancy check (CRC) are supported.

48kHz

1024

[48, 256]

Related to the bit rate. A smaller bit rate indicates a higher compression ratio.

-

  • Advantages: a higher sampling rate of encoding/decoding; more high-frequency information in the original audio retained; more bit rates for selection; and higher compression rates.
  • Disadvantage: high algorithm complexity.