Misplaced Pages

AAC-LD

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The MPEG-4 Low Delay Audio Coder (a.k.a. AAC Low Delay , or AAC-LD ) is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the MPEG-2 Advanced Audio Coding (AAC) standard. It was published in MPEG-4 Audio Version 2 ( ISO / IEC 14496-3:1999/Amd 1:2000) and in its later revisions.

#169830

34-519: AAC-LD uses a version of the modified discrete cosine transform (MDCT) audio coding technique called the LD-MDCT. AAC-LD is widely used by Apple as the voice-over-IP (VoIP) speech codec in FaceTime . The most stringent requirements are a maximum algorithmic delay of only 20 ms and a good audio quality for all kind of audio signals including speech and music. Two-way communication with AAC-LD

68-1016: A . This is the reason for using a window function that reduces the components near the boundaries of the input sequence ( a , b , c , d ) towards 0. Above, the TDAC property was proved for the ordinary MDCT, showing that adding IMDCTs of subsequent blocks in their overlapping half recovers the original data. The derivation of this inverse property for the windowed MDCT is only slightly more complicated. Consider to overlapping consecutive sets of 2 N inputs ( A , B ) and ( B , C ), for blocks A , B , C of size N . Recall from above that when ( A , B ) {\displaystyle (A,B)} and ( B , C ) {\displaystyle (B,C)} are MDCTed, IMDCTed, and added in their overlapping half, we obtain ( B + B R ) / 2 + ( B − B R ) / 2 = B {\displaystyle (B+B_{R})/2+(B-B_{R})/2=B} ,

102-412: A linear function is a polynomial of degree one or less, including the zero polynomial (the latter not being considered to have degree zero). When the function is of only one variable , it is of the form where a and b are constants , often real numbers . The graph of such a function of one variable is a nonvertical line. a is frequently referred to as the slope of the line, and b as

136-596: A result of these advantages, the MDCT is the most widely used lossy compression technique in audio data compression . It is employed in most modern audio coding standards , including MP3 , Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC , Cook , Advanced Audio Coding (AAC), High-Definition Coding (HDC), LDAC , Dolby AC-4 , and MPEG-H 3D Audio , as well as speech coding standards such as AAC-LD (LD-MDCT), G.722.1 , G.729.1 , CELT , and Opus . The discrete cosine transform (DCT)

170-418: Is a hyperplane of dimension k . A constant function is also considered linear in this context, as it is a polynomial of degree zero or is the zero polynomial. Its graph, when there is only one variable, is a horizontal line. In this context, a function that is also a linear map (the other meaning) may be referred to as a homogeneous linear function or a linear form . In the context of linear algebra,

204-470: Is designed to be performed on consecutive blocks of a larger dataset , where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As

238-433: Is equivalent to a DCT-IV of the N inputs (− c R − d , a − b R ). The DCT-IV is designed for the case where the function at the right boundary is odd, and therefore the values near the right boundary are close to 0. If the input signal is smooth, this is the case: the rightmost components of a and b R are consecutive in the input sequence ( a , b , c , d ), and therefore their difference

272-408: Is possible on usual analog telephone lines and via ISDN connections. It can use a bit rate of 32 - 64 kbit/s or higher. Compared to known speech coders, the codec is capable of coding both music and speech signals with good quality. Unlike speech coders, however, the achieved coding quality scales up with bitrate. Transparent quality can be achieved. AAC-LD can also process stereo signals by using

306-400: Is small. Let us look at the middle of the interval: if we rewrite the above expression as (− c R − d , a − b R ) = (− d , a )−( b , c ) R , the second term, ( b , c ) R , gives a smooth transition in the middle. However, in the first term, (− d , a ), there is a potential discontinuity where the right end of − d meets the left end of

340-539: The N real numbers X 0 , ..., X N -1 according to the formula: (The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.) The inverse MDCT is known as the IMDCT . Because there are different numbers of inputs and outputs, at first glance it might seem that

374-455: The IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2 N and shifted back to the left by N /2. The inverse DCT-IV would simply give back the inputs (− c R − d , a − b R ) from above. When this is extended via the boundary conditions and shifted, one obtains: Half of

SECTION 10

#1732801360170

408-484: The IMDCT outputs are thus redundant, as b − a R = −( a − b R ) R , and likewise for the last two terms. If we group the input into bigger blocks A , B of size N , where A  = ( a , b ) and B  = ( c , d ), we can write this result in a simpler way: One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2 N block ( B , C ). The IMDCT will then yield, analogous to

442-462: The IMDCT.) In principle, x and y could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equal-sized blocks. The transform remains invertible (that is, TDAC works), for a symmetric window w n = w 2 N −1− n , as long as w satisfies

476-476: The MDCT is somewhat unusual compared to other Fourier-related transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a linear function F : R 2 N → R N {\displaystyle F\colon \mathbf {R} ^{2N}\to \mathbf {R} ^{N}} (where R denotes the set of real numbers ). The 2 N real numbers x 0 , ..., x 2 N -1 are transformed into

510-422: The MDCT should not be invertible. However, perfect invertibility is achieved by adding the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to cancel and the original data to be retrieved; this technique is known as time-domain aliasing cancellation ( TDAC ). The IMDCT transforms N real numbers X 0 , ..., X N -1 into 2 N real numbers y 0 , ..., y 2 N -1 according to

544-456: The MDST, based on the discrete sine transform , as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.) In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter (PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of

578-449: The PQF filter bank. Such a combination of a filter bank with an MDCT is called a hybrid filter bank or a subband MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used) MPEG-4 AAC-SSR variant (by Sony ) uses a four-band PQF bank followed by an MDCT. Similar to MP3, ATRAC uses stacked quadrature mirror filters (QMF) followed by an MDCT. As a lapped transform,

612-509: The Princen-Bradley condition: Various window functions are used. A window that produces a form known as a modulated lapped transform (MLT) is given by and is used for MP3 and MPEG-2 AAC, and for Vorbis. AC-3 uses a Kaiser–Bessel derived (KBD) window , and MPEG-4 AAC can also use a KBD window. Note that windows applied to the MDCT are different from windows used for some other types of signal analysis, since they must fulfill

646-547: The Princen–Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis). As can be seen by inspection of the definitions, for even N the MDCT is essentially equivalent to a DCT-IV, where the input is shifted by N /2 and two N -blocks of data are transformed at once. By examining this equivalence more carefully, important properties like TDAC can be easily derived. In order to define

680-413: The above: ( B − B R , C + C R ) / 2. When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply B , recovering the original data. The origin of the term "time-domain aliasing cancellation" is now clear. The use of input data that extend beyond the boundaries of the logical DCT-IV causes the data to be aliased in

714-414: The advanced stereo coding tools of AAC. Thus it is possible to transmit a stereo signal with a bandwidth of 7 kHz via one ISDN line or with a bandwidth of 15 kHz via two ISDN lines. Modified discrete cosine transform The modified discrete cosine transform ( MDCT ) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped : it

SECTION 20

#1732801360170

748-533: The combinations to cancel when they are added. For odd N (which are rarely used in practice), N /2 is not an integer so the MDCT is not simply a shift permutation of a DCT-IV. In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is analogous to the above. We have seen above that the MDCT of 2 N inputs ( a , b , c , d )

782-470: The formula: (Like for the DCT-IV , an orthogonal transform, the inverse has the same form as the forward transform.) In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/ N ). Although the direct application of the MDCT formula would require O( N ) operations, it is possible to compute

816-409: The inputs into four blocks ( a , b , c , d ) each of size N /2. If we shift these to the right by N /2 (from the + N /2 term in the MDCT definition), then ( b , c , d ) extend past the end of the N DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above. (In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.) Similarly,

850-399: The intercept. If a > 0 then the gradient is positive and the graph slopes upwards. If a < 0 then the gradient is negative and the graph slopes downwards. For a function f ( x 1 , … , x k ) {\displaystyle f(x_{1},\ldots ,x_{k})} of any finite number of variables, the general formula is and the graph

884-506: The multiplication by 1/2, because the IMDCT normalization differs by a factor of 2 in the windowed case.) Similarly, the windowed MDCT and IMDCT of ( B , C ) {\displaystyle (B,C)} yields, in its first- N half: When we add these two halves together, we obtain: recovering the original data. Linear function In mathematics , the term linear function refers to two distinct but related notions: In calculus, analytic geometry and related areas,

918-600: The original data. Now we suppose that we multiply both the MDCT inputs and the IMDCT outputs by a window function of length 2 N . As above, we assume a symmetric window function, which is therefore of the form ( W , W R ) {\displaystyle (W,W_{R})} where W is a length- N vector and R denotes reversal as before. Then the Princen-Bradley condition can be written as W 2 + W R 2 = ( 1 , 1 , … ) {\displaystyle W^{2}+W_{R}^{2}=(1,1,\ldots )} , with

952-534: The polynomial functions of degree 0 or 1 are the scalar-valued affine maps . In linear algebra, a linear function is a map f between two vector spaces such that Here a denotes a constant belonging to some field K of scalars (for example, the real numbers ) and x and y are elements of a vector space , which might be K itself. In other terms the linear function preserves vector addition and scalar multiplication . Some authors use "linear function" only for linear maps that take values in

986-1823: The precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around n  = −1/2), odd at its right boundary (around n  =  N  − 1/2), and so on (instead of periodic boundaries as for a DFT ). This follows from the identities cos ⁡ [ π N ( − n − 1 + 1 2 ) ( k + 1 2 ) ] = cos ⁡ [ π N ( n + 1 2 ) ( k + 1 2 ) ] {\displaystyle \cos \left[{\frac {\pi }{N}}\left(-n-1+{\frac {1}{2}}\right)\left(k+{\frac {1}{2}}\right)\right]=\cos \left[{\frac {\pi }{N}}\left(n+{\frac {1}{2}}\right)\left(k+{\frac {1}{2}}\right)\right]} and cos ⁡ [ π N ( 2 N − n − 1 + 1 2 ) ( k + 1 2 ) ] = − cos ⁡ [ π N ( n + 1 2 ) ( k + 1 2 ) ] {\displaystyle \cos \left[{\frac {\pi }{N}}\left(2N-n-1+{\frac {1}{2}}\right)\left(k+{\frac {1}{2}}\right)\right]=-\cos \left[{\frac {\pi }{N}}\left(n+{\frac {1}{2}}\right)\left(k+{\frac {1}{2}}\right)\right]} . Thus, if its inputs are an array x of length N , we can imagine extending this array to ( x , − x R , − x , x R , ...) and so on, where x R denotes x in reverse order. Consider an MDCT with 2 N inputs and N outputs, where we divide

1020-447: The same thing with only O( N log N ) complexity by recursively factorizing the computation, as in the fast Fourier transform (FFT). One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O( N ) pre- and post-processing steps. Also, as described below, any algorithm for the DCT-IV immediately provides a method to compute the MDCT and IMDCT of even size. In typical signal-compression applications,

1054-457: The same way that frequencies beyond the Nyquist frequency are aliased to lower frequencies, except that this aliasing occurs in the time domain instead of the frequency domain: we cannot distinguish the contributions of a and of b R to the MDCT of ( a , b , c , d ), or equivalently, to the result of The combinations c − d R and so on, have precisely the right signs for

AAC-LD - Misplaced Pages Continue

1088-451: The squares and additions performed elementwise. Therefore, instead of MDCTing ( A , B ) {\displaystyle (A,B)} , we now MDCT ( W A , W R B ) {\displaystyle (WA,W_{R}B)} (with all multiplications performed elementwise). When this is IMDCTed and multiplied again (elementwise) by the window function, the last- N half becomes: (Note that we no longer have

1122-416: The transform properties are further improved by using a window function w n ( n = 0, ..., 2 N −1) that is multiplied with x n in the MDCT and with y n in the IMDCT formulas, above, in order to avoid discontinuities at the n = 0 and 2 N boundaries by making the function go smoothly to zero at those points. (That is, the window function is applied to the data before the MDCT or after

1156-498: Was first proposed by Nasir Ahmed in 1972, and demonstrated by Ahmed with T. Natarajan and K. R. Rao in 1974. The MDCT was later proposed by John P. Princen, A.W. Johnson and Alan B. Bradley at the University of Surrey in 1987, following earlier work by Princen and Bradley (1986) to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below. (There also exists an analogous transform,

#169830