MPEG-4 Part 3 - Misplaced Pages

MPEG-4 Part 3 or MPEG-4 Audio (formally ISO / IEC 14496-3) is the third part of the ISO / IEC MPEG-4 international standard developed by Moving Picture Experts Group . It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.

#381618

44-422: The MPEG-4 Part 3 consists of a variety of audio coding technologies – from lossy speech coding ( HVXC , CELP ), general audio coding ( AAC , TwinVQ , BSAC), lossless audio compression ( MPEG-4 SLS , Audio Lossless Coding , MPEG-4 DST ), a Text-To-Speech Interface (TTSI), Structured Audio (using SAOL , SASL, MIDI ) and many additional audio synthesis and coding techniques. MPEG-4 Audio does not target

88-468: A better representation of data. Another use is for backward compatibility and graceful degradation : in color television, encoding color via a luminance - chrominance transform domain (such as YUV ) means that black-and-white sets display the luminance, while ignoring the color information. Another example is chroma subsampling : the use of color spaces such as YIQ , used in NTSC , allow one to reduce

132-403: A certain amount of information, and there is a lower bound to the size of a file that can still carry all the information. Basic information theory says that there is an absolute limit in reducing the size of this data. When data is compressed, its entropy increases, and it cannot increase indefinitely. For example, a compressed ZIP file is smaller than its original, but repeatedly compressing

176-507: A lossy format and a lossless correction which when combined reproduce the original signal; the correction can be stripped, leaving a smaller, lossily compressed, file. Such formats include MPEG-4 SLS (Scalable to Lossless), WavPack , OptimFROG DualStream , and DTS-HD Master Audio in lossless (XLL) mode ). Researchers have performed lossy compression on text by either using a thesaurus to substitute short words for long ones, or generative text techniques, although these sometimes fall into

220-545: A lot of fine detail during a very loud passage. Developing lossy compression techniques as closely matched to human perception as possible is a complex task. Sometimes the ideal is a file that provides exactly the same perception as the original, with as much digital information as possible removed; other times, perceptible loss of quality is considered a valid tradeoff. The terms "irreversible" and "reversible" are preferred over "lossy" and "lossless" respectively for some applications, such as medical image compression, to circumvent

264-458: A selective loss of the least significant data, rather than losing data across the board. Further, a transform coding may provide a better domain for manipulating or otherwise editing the data – for example, equalization of audio is most naturally expressed in the frequency domain (boost the bass, for instance) rather than in the raw time domain. From this point of view, perceptual encoding is not essentially about discarding data, but rather about

308-407: A short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands, coding efficiency around (1,2,3) * fs/8 is worse than with normal MPEG-4 AAC LC. MPEG-4 AAC-SSR is very similar to ATRAC and ATRAC-3 . The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing

352-562: A single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback. MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content. MPEG-4 Part 3 contains following subparts: MPEG-4 Audio includes

396-649: A single solution. The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the Delivery Multimedia Integration Framework (DMIF) in ISO/IEC 14496-6. A wide variety of delivery mechanisms exist below this interface, e.g., MPEG transport stream , Real-time Transport Protocol (RTP), etc. Transport in Real-time Transport Protocol

440-541: A system for handling a diverse group of audio formats in a uniform manner. Each format is assigned a unique Audio Object Type to represent it. Object Type is used to distinguish between different coding methods. It directly determines the MPEG-4 tool subset required to decode a specific object. The MPEG-4 profiles are based on the object types and each profile supports a different list of object types. The MPEG-4 Audio standard defines several profiles. These profiles are based on

484-582: A user acquires a lossily compressed file, (for example, to reduce download time) the retrieved file can be quite different from the original at the bit level while being indistinguishable to the human ear or eye for most practical purposes. Many compression methods focus on the idiosyncrasies of human physiology , taking into account, for instance, that the human eye can see only certain wavelengths of light. The psychoacoustic model describes how sound can be highly compressed without degrading perceived quality. Flaws caused by lossy compression that are noticeable to

SECTION 10

#1732783374382

528-531: A way that reduces the size of a computer file needed to store it, or the bandwidth needed to transmit it, with no loss of the full information contained in the original file. A picture, for example, is converted to a digital file by considering it to be an array of dots and specifying the color and brightness of each dot. If the picture contains an area of the same color, it can be compressed without loss by saying "200 red dots" instead of "red dot, red dot, ...(197 more times)..., red dot." The original data contains

572-617: Is a main goal of transform coding, it also allows other goals: one may represent data more accurately for the original amount of space – for example, in principle, if one starts with an analog or high-resolution digital master , an MP3 file of a given size should provide a better representation than a raw uncompressed audio in WAV or AIFF file of the same size. This is because uncompressed audio can only reduce file size by lowering bit rate or depth, whereas compressing audio can reduce size while maintaining bit rate and depth. This compression becomes

616-433: Is a type of data compression used for digital images , digital audio signals , and digital video . The transformation is typically used to enable better (more targeted) quantization . Knowledge of the application is used to choose information to discard, thereby lowering its bandwidth . The remaining information can then be compressed via a variety of methods. When the output is decoded, the result may not be identical to

660-422: Is an MPEG-4 standard (ISO/IEC 14496-3 subpart 4) for scalable audio coding. BSAC uses an alternative noiseless coding to AAC, with the rest of the processing being identical to AAC. This support for scalability allows for nearly transparent sound quality at 64 kbit/s and graceful degradation at lower bit rates. BSAC coding is best performed in the range of 40 kbit/s to 64 kbit/s, though it operates in

704-479: Is an extension of AAC LC using spectral band replication (SBR), and Parametric Stereo (PS). It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio. AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards. It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding (AAC) in 1997. The audio signal

748-571: Is because these types of data are intended for human interpretation where the mind can easily "fill in the blanks" or see past very minor errors or inconsistencies – ideally lossy compression is transparent (imperceptible), which can be verified via an ABX test . Data files using lossy compression are smaller in size and thus cost less to store and to transmit over the Internet, a crucial consideration for streaming video services such as Netflix and streaming audio services such as Spotify . When

792-601: Is defined in RFC 3016 (RTP Payload Format for MPEG-4 Audio/Visual Streams), RFC 3640 (RTP Payload Format for Transport of MPEG-4 Elementary Streams), RFC 4281 (The Codecs Parameter for "Bucket" Media Types) and RFC 4337 (MIME Type Registration for MPEG-4). LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems. The Advanced Audio Coding in MPEG-4 Part 3 (MPEG-4 Audio) Subpart 4

836-428: Is first split into 4 bands using a 4 band polyphase quadrature filter bank. Then these 4 bands are further split using MDCTs with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal. The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using

880-569: Is noticed by the end-user. Even when noticeable by the user, further data reduction may be desirable (e.g., for real-time communication or to reduce transmission times or storage needs). The most widely used lossy compression algorithm is the discrete cosine transform (DCT), first published by Nasir Ahmed , T. Natarajan and K. R. Rao in 1974. Lossy compression is most commonly used to compress multimedia data ( audio , video , and images ), especially in applications such as streaming media and internet telephony . By contrast, lossless compression

924-478: Is often the case in practice, to produce a representation with lower resolution or lower fidelity than a given one, one needs to start with the original source signal and encode, or start with a compressed representation and then decompress and re-encode it ( transcoding ), though the latter tends to cause digital generation loss . Another approach is to encode the original signal at several different bitrates, and then either choose which to use (as when streaming over

SECTION 20

#1732783374382

968-424: Is typically required for text and data files, such as bank records and text articles. It can be advantageous to make a master lossless file which can then be used to produce additional copies from. This allows one to avoid basing new compressed copies off of a lossy source file, which would yield additional artifacts and further unnecessary information loss . It is possible to compress many types of digital data in

1012-649: The chrominance channel). While unwanted information is destroyed, the quality of the remaining portion is unchanged. Some other transforms are possible to some extent, such as joining images with the same encoding (composing side by side, as on a grid) or pasting images such as logos onto existing images (both via Jpegjoin ), or scaling. Some changes can be made to the compression without re-encoding: The freeware Windows-only IrfanView has some lossless JPEG operations in its JPG_TRANSFORM plugin . Metadata, such as ID3 tags , Vorbis comments , or Exif information, can usually be modified or removed without modifying

1056-440: The case of audio data, a popular form of transform coding is perceptual coding , which transforms the raw data to a domain that more accurately reflects the information content. For example, rather than expressing a sound file as the amplitude levels over time, one may express it as the frequency spectrum over time, which corresponds more accurately to human audio perception. While data reduction (compression, be it lossy or lossless)

1100-512: The content. These techniques are used to reduce data size for storing, handling, and transmitting content. Higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression (reversible data compression) which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than using lossless techniques. Well-designed lossy compression technology often reduces file sizes significantly before degradation

1144-517: The data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate. Example: Note: although possible, the resulting quality is much worse than typical for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is achieved by using intensity stereo and reduced NMRs. This degrades audible quality less than transmitting 6 kHz bandwidth with perfect quality. Bit Sliced Arithmetic Coding

1188-444: The file size as if it had been compressed to a greater degree, but without more loss than this, is sometimes also possible. The primary programs for lossless editing of JPEGs are jpegtran , and the derived exiftran (which also preserves Exif information), and Jpegcrop (which provides a Windows interface). These allow the image to be cropped , rotated, flipped , and flopped , or even converted to grayscale (by dropping

1232-563: The file will cause it to progressively lose quality. This is in contrast with lossless data compression , where data will not be lost via the use of such a procedure. Information-theoretical foundations for lossy data compression are provided by rate-distortion theory . Much like the use of probability in optimal coding theory , rate-distortion theory heavily draws on Bayesian estimation and decision theory in order to model perceptual distortion and even aesthetic judgment. There are two basic lossy compression schemes: In some systems

1276-420: The human eye or ear are known as compression artifacts . The compression ratio (that is, the size of the compressed file compared to that of the uncompressed file) of lossy video codecs is nearly always far superior to that of the audio and still-image equivalents. An important caveat about lossy compression (formally transcoding), is that editing lossily compressed files causes digital generation loss from

1320-470: The internet – as in RealNetworks ' " SureStream " – or offering varying downloads, as at Apple's iTunes Store ), or broadcast several, where the best that is successfully received is used, as in various implementations of hierarchical modulation . Similar techniques are used in mipmaps , pyramid representations , and more sophisticated scale space methods. Some audio formats feature a combination of

1364-515: The negative implications of "loss". The type and amount of loss can affect the utility of the images. Artifacts or undesirable effects of compression may be clearly discernible yet the result still useful for the intended purpose. Or lossy compressed images may be ' visually lossless ', or in the case of medical images, so-called diagnostically acceptable irreversible compression (DAIC) may have been applied. Some forms of lossy compression can be thought of as an application of transform coding , which

MPEG-4 Part 3 - Misplaced Pages Continue

1408-472: The newness of the standard. The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles: Low Complexity profile (LC), Main profile and Scalable Sampling Rate profile (SSR). The MPEG-4 Part 3 Subpart 4 (General Audio Coding) combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution (PNS) and defined them as Audio Object Types (AAC LC, AAC Main, AAC SSR). High-Efficiency Advanced Audio Coding

1452-483: The object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time. There is no standard for transport of elementary streams over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with

1496-523: The original input, but is expected to be close enough for the purpose of the application. The most common form of lossy compression is a transform coding method, the discrete cosine transform (DCT), which was first published by Nasir Ahmed , T. Natarajan and K. R. Rao in 1974. DCT is the most widely used form of lossy compression, for popular image compression formats (such as JPEG ), video coding standards (such as MPEG and H.264/AVC ) and audio compression formats (such as MP3 and AAC ). In

1540-403: The original, format conversion may be needed in the future to achieve compatibility with software or devices ( format shifting ), or to avoid paying patent royalties for decoding or distribution of compressed files. By modifying the compressed data directly without decoding and re-encoding, some editing of lossily compressed files without degradation of quality is possible. Editing which reduces

1584-900: The range of 16 kbit/s to 64 kbit/s. The AAC-BSAC codec is used in Digital Multimedia Broadcasting (DMB) applications. In 2002, the MPEG-4 Audio Licensing Committee selected the Via Licensing Corporation as the Licensing Administrator for the MPEG-4 Audio patent pool . Lossy compression In information technology , lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent

1628-467: The re-encoding. This can be avoided by only producing lossy files from (lossless) originals and only editing (copies of) original files, such as images in raw image format instead of JPEG . If data which has been compressed lossily is decoded and compressed losslessly, the size of the result can be comparable with the size of the data before lossy compression, but the data already lost cannot be recovered. When deciding to use lossy conversion without keeping

1672-407: The related category of lossy data conversion . A general kind of lossy compression is to lower the resolution of an image, as in image scaling , particularly decimation . One may also remove less "lower information" parts of an image, such as by seam carving . Many media transforms, such as Gaussian blur , are, like lossy compression, irreversible: the original signal cannot be reconstructed from

1716-514: The resolution on the components to accord with human perception – humans have highest resolution for black-and-white (luma), lower resolution for mid-spectrum colors like yellow and green, and lowest for red and blues – thus NTSC displays approximately 350 pixels of luma per scanline , 150 pixels of yellow vs. green, and 50 pixels of blue vs. red, which are proportional to human sensitivity to each component. Lossy compression formats suffer from generation loss : repeatedly compressing and decompressing

1760-422: The same file will not reduce the size to nothing. Most compression algorithms can recognize when further compression would be pointless and would in fact increase the size of the data. In many cases, files or data streams contain more information than is needed. For example, a picture may have more detail than the eye can distinguish when reproduced at the largest size intended; likewise, an audio file does not need

1804-474: The transformed signal. However, in general these will have the same size as the original, and are not a form of compression. Lowering resolution has practical uses, as the NASA New Horizons craft transmitted thumbnails of its encounter with Pluto-Charon before it sent the higher resolution images. Another solution for slow connections is the usage of Image interlacing which progressively defines

MPEG-4 Part 3 - Misplaced Pages Continue

1848-430: The two techniques are combined, with transform codecs being used to compress the error signals generated by the predictive stage. The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any lossless method, while still meeting the requirements of the application. Lossy methods are most often used for compressing sound, images or videos. This

1892-821: The underlying data. One may wish to downsample or otherwise decrease the resolution of the represented source signal and the quantity of data used for its compressed representation without re-encoding, as in bitrate peeling , but this functionality is not supported in all designs, as not all codecs encode data in a form that allows less important detail to simply be dropped. Some well-known designs that have this capability include JPEG 2000 for still images and H.264/MPEG-4 AVC based Scalable Video Coding for video. Such schemes have also been standardized for older designs as well, such as JPEG images with progressive encoding, and MPEG-2 and MPEG-4 Part 2 video, although those prior schemes had limited success in terms of adoption into real-world common usage. Without this capacity, which

1936-463: Was enhanced relative to the previous standard MPEG-2 Part 7 (Advanced Audio Coding), in order to provide better sound quality for a given encoding bitrate. It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to

#381618