Misplaced Pages

DirectShow

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

DirectShow (sometimes abbreviated as DS or DShow ), codename Quartz , is a multimedia framework and API produced by Microsoft for software developers to perform various operations with media files or streams. It is the replacement for Microsoft's earlier Video for Windows technology. Based on the Microsoft Windows Component Object Model (COM) framework, DirectShow provides a common interface for media across various programming languages , and is an extensible, filter -based framework that can render or record media files on demand at the request of the user or developer. The DirectShow development tools and documentation were originally distributed as part of the DirectX SDK . Currently, they are distributed as part of the Windows SDK (formerly known as the Platform SDK).

#390609

108-600: Microsoft plans to completely replace DirectShow gradually with Media Foundation in future Windows versions. One reason cited by Microsoft is to provide "much more robust support for content protection systems" (see digital rights management ). Microsoft's Becky Weiss confirmed in 2006 that "you'll notice that working with the Media Foundation requires you to work at a slightly lower level than working with DirectShow would have. And there are still DirectShow features that aren't (yet) in Media Foundation". As described in

216-608: A 32-bit environment and did not utilize COM. The development team used a pre-existing modular digital-media-processor project codenamed "Clockwork" as a basis for DirectShow. Clockwork had previously been used in the Microsoft Interactive Television project. The project was initially named "ActiveMovie", and was released in May 1996, bundled with the beta version of Internet Explorer 3 .0. In March 1997, Microsoft announced that ActiveMovie would become part of

324-414: A B-frame. Because of this, a very low bitrate B-frame can be inserted, where needed, to help control the bitrate. If this was done with a P-frame, future P-frames would be predicted from it and would lower the quality of the entire sequence. However, similarly, the future P-frame must still encode all the changes between it and the previous I- or P- anchor frame. B-frames can also be beneficial in videos where

432-506: A GOP size of 15–18. i.e. 1 I-frame for every 14-17 non-I-frames (some combination of P- and B- frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit. Limits are placed on the maximum number of frames between I-frames due to decoding complexing, decoder buffer size, recovery time after data errors, seeking ability, and accumulation of IDCT errors in low-precision implementations most common in hardware decoders (See: IEEE -1180). "P-frame"

540-652: A Pipeline but, unlike the Sample Grabber Sink, no such standard component exists. Media Foundation Transforms (MFTs) represent a generic model for processing media data. They are used in Media Foundation primarily to implement decoders, encoders, mixers and digital signal processors (DSPs) – between media sources and media sinks . Media Foundation Transforms are an evolution of the transform model first introduced with DirectX Media Objects (DMOs). Their behaviors are more clearly specified. Hybrid DMO/MFT Objects can also be created. Applications can use MFTs inside

648-470: A Source Reader to provide the media data and a Sink Writer component to consume it. The Source Reader does contain a type of internal pipeline but this is not accessible to the application. A Source Reader is not a Media Source and a Sink Writer is not a Media Sink and neither can be directly included in a Pipeline or managed by a Media Session. In general, the media data flows from the Source Reader to

756-462: A bitrate less than 1.5 Mbit/s, make up what is known as a constrained parameters bitstream (CPB), later renamed the "Low Level" (LL) profile in MPEG-2. This is the minimum video specifications any decoder should be able to handle, to be considered MPEG-1 compliant . This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of

864-562: A custom Media Transform which copies the Media Samples and passes them to a Sink Writer as they pass through the Pipeline. In both cases a special component in the Pipeline effectively acts like a simple Reader-Writer application and feeds a Sink Writer. In general, Hybrid Architectures use a Pipeline and a Sink Writer. Theoretically, it is possible to implement a mechanism in which a Source Reader could somehow inject Media Samples into

972-682: A few minor modifications, using a third party software library called "DSPack". As of March, 2012 (and, apparently as early as 2009), Microsoft has stated that the DirectShow Editing Services "API is not supported and may be altered or unavailable in the future." Originally, in Windows 9x , DirectShow used the Video Renderer filter. This drew the images using DirectDraw 3, but could also fall back to GDI or overlay drawing modes in some circumstances (depending upon

1080-491: A file, or a network server or even a camcorder, with source specific functionality abstracted by a common interface . A source object can use a source resolver object which creates a media source from an URI , file or bytestream. Support for non-standard protocols can be added by creating a source resolver for them. A source object can also use a sequencer object to use a sequence of sources (a playlist ) or to coalesce multiple sources into single logical source. A media sink

1188-476: A filter graph automatically from a source such as a file or URL. If this is not possible, the developer may be able to manually create a filter graph from a source file, possibly with the addition of a custom filter, and then let DirectShow complete the filter graph by connecting the filters together. At the next level, the developer must build the filter graph from scratch by manually adding and connecting each desired filter. Finally, in cases where an essential filter

SECTION 10

#1732798489391

1296-799: A graph to render a given media type, in certain instances it is difficult for developers to rely on this functionality and they need to resort to manually building filter graphs if the resulting filter graph is variable. It is possible for filter graphs to change over time as new filters are installed on the computer. By default, DirectShow includes a number of filters for decoding some common media file formats such as MPEG-1 , MP3 , Windows Media Audio , Windows Media Video , MIDI , media containers such as AVI , ASF , WAV , some splitters/demultiplexers, multiplexers, source and sink filters, some static image filters, some video acceleration, and minimal digital rights management (DRM) support. DirectShow's standard format repertoire can be easily expanded by means of

1404-414: A higher level API that uses DirectShow, notably, Windows Media Player SDK , an API provides the developer with an ActiveX Control that has fewer COM interfaces to deal with. Although DirectShow is capable of dynamically building a graph to render a given media type, in certain instances it is difficult for developers to rely on this functionality and they need to resort to manually building filter graphs if

1512-468: A more unified approach for digital data access control for digital rights management (DRM) and its interoperability. It integrates DXVA 2.0 for offloading more of the video processing pipeline to hardware, for better performance. Videos are processed in the colorspace they were encoded in, and are handed off to the hardware, which composes the image in its native colorspace. This prevents intermediate colorspace conversions to improve performance. MF includes

1620-571: A new renderer, available as both a Media Foundation component and a DirectShow filter, called the Enhanced Video Renderer ( EVR ). EVR is designed to work with Desktop Window Manager and supports DXVA 2.0 , which is available on Windows Vista and Windows 7. It offers better performance and better quality according to Microsoft. On January 8, 2007. Microsoft received the Emmy award for Streaming Media Architectures and Components at

1728-656: A new video renderer, called Enhanced Video Renderer (EVR), which is the next iteration of VMR 7 and 9 . EVR has better support for playback timing and synchronization. It uses the Multimedia Class Scheduler Service (MMCSS), a new service that prioritizes real time multimedia processing, to reserve the resources required for the playback, without any tearing or glitches. The second release included in Windows 7 introduces expanded media format support and DXVA HD for acceleration of HD content if WDDM 1.1 drivers are used. The MF architecture

1836-410: A picture) redundancy common in video to achieve better data compression than would be possible otherwise. (See: Video compression ) Before encoding video to MPEG-1, the color-space is transformed to Y′CbCr (Y′=Luma, Cb=Chroma Blue, Cr=Chroma Red). Luma (brightness, resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components. The chroma

1944-408: A portion of an MPEG program, and is also used by the decoder to determine when data can be discarded from the buffer . Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded. PTS handling can be problematic. Decoders must accept multiple program streams that have been concatenated (joined sequentially). This causes PTS values in

2052-413: A rate that the rendering synchronizes with the presentation clock. The rate (or time) of rendering is embedded as a part of the multimedia stream as metadata. The source objects extract the metadata and pass it over. Metadata is of two types: coded metadata , which is information about bit rate and presentation timings, and descriptive metadata , like title and author names. Coded metadata is handed over to

2160-480: A single stream, ensuring simultaneous delivery, and maintaining synchronization. The PS structure is known as a multiplex , or a container format . Presentation time stamps (PTS) exist in PS to correct the inevitable disparity between audio and video SCR values (time-base correction). 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values. PTS determines when to display

2268-691: A source of annoyance. Because of the subsampling, Y′CbCr 4:2:0 video is ordinarily stored using even dimensions ( divisible by 2 horizontally and vertically). Y′CbCr color is often informally called YUV to simplify the notation, although that term more properly applies to a somewhat different color format. Similarly, the terms luminance and chrominance are often used instead of the (more accurate) terms luma and chroma. MPEG-1 supports resolutions up to 4095×4095 (12 bits), and bit rates up to 100 Mbit/s. MPEG-1 videos are most commonly seen using Source Input Format (SIF) resolution: 352×240, 352×288, or 320×240. These relatively low resolutions, combined with

SECTION 20

#1732798489391

2376-457: A specific video is. I-frame only MPEG-1 video is very similar to MJPEG video. So much so that very high-speed and theoretically lossless (in reality, there are rounding errors) conversion can be made from one format to the other, provided a couple of restrictions (color space and quantization matrix) are followed in the creation of the bitstream. The length between I-frames is known as the group of pictures (GOP) size. MPEG-1 most commonly uses

2484-449: A user interface for encoding using installed codecs or to different formats; instead, it relies on developers to develop software using the API. In contrast, other multimedia frameworks such as QuickTime or Video for Windows allow end-users to perform basic video-related tasks such as re-encoding using a different codec and editing files and streams. The convenience offered by an end-user GUI

2592-452: A variety of filters, enabling DirectShow to support virtually any container format and any audio or video codec. For example, filters have been developed for Ogg Vorbis , Musepack , and AC3 , and some codecs such as MPEG-4 Advanced Simple Profile , AAC , H.264 , Vorbis and containers MOV , MP4 are available from 3rd parties like ffdshow , K-Lite , and CCCP . Incorporating support for additional codecs such as these can involve paying

2700-473: A video at high speed. Given moderately higher-performance decoding equipment, fast preview can be accomplished by decoding I-frames instead of D-frames. This provides higher quality previews, since I-frames contain AC coefficients as well as DC coefficients. If the encoder can assume that rapid I-frame decoding capability is available in decoders, it can save bits by not sending D-frames (thus improving compression of

2808-543: A video card via an analog link rather than via the PCI bus ). Overlay Mixer also supports DXVA connections. Because it always renders in overlay , full-screen video to TV-out is always activated. Starting with Windows XP , a new filter called the Video Mixing Renderer 7 ( VMR-7 or sometimes just referred to as VMR ) was introduced. The number 7 was because VMR-7 only used DirectDraw version 7 to render

2916-453: Is also subsampled to 4:2:0 , meaning it is reduced to half resolution vertically and half resolution horizontally, i.e., to just one quarter the number of samples used for the luma component of the video. This use of higher resolution for some color components is similar in concept to the Bayer pattern filter that is commonly used for the image capturing sensor in digital color cameras. Because

3024-402: Is an abbreviation for "Predicted-frame". They may also be called forward-predicted frames or inter-frames (B-frames are also inter-frames). P-frames exist to improve compression by exploiting the temporal (over time) redundancy in a video. P-frames store only the difference in image from the frame (either an I-frame or P-frame) immediately preceding it (this reference frame is also called

3132-743: Is apparent since the AVI format and codecs used by Video for Windows still remain in use, for example VirtualDub . Media Foundation Media Foundation ( MF ) is a COM-based multimedia framework pipeline and infrastructure platform for digital media in Windows Vista , Windows 7 , Windows 8 , Windows 8.1 , Windows 10 , and Windows 11 . It is the intended replacement for Microsoft DirectShow , Windows Media SDK , DirectX Media Objects (DMOs) and all other so-called "legacy" multimedia APIs such as Audio Compression Manager (ACM) and Video for Windows (VfW) . The existing DirectShow technology

3240-474: Is composited onto a single surface by coloring each pixel according to the color and transparency of the corresponding pixel in all streams. Internally, the EVR uses a mixer object for mixing the streams. It can also deinterlace the output and apply color correction, if required. The composited frame is handed off to a presenter object, which schedules them for rendering onto a Direct3D device, which it shares with

3348-571: Is defined by the standard, and small errors in the bitstream may cause noticeable defects. This structure was later named an MPEG program stream : "The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure." This terminology is more popular, precise (differentiates it from an MPEG transport stream ) and will be used here. Program Streams (PS) are concerned with combining multiple packetized elementary streams (usually just one audio and video PES) into

DirectShow - Misplaced Pages Continue

3456-472: Is defined in ISO/IEC-11172-2. The design was heavily influenced by H.261 . MPEG-1 Video exploits perceptual compression methods to significantly reduce the data rate required by a video stream. It reduces or completely discards information in certain frequencies and areas of the picture that the human eye has limited ability to fully perceive. It also exploits temporal (over time) and spatial (across

3564-556: Is divided into the Control layer , Core Layer and the Platform layer . The core layer encapsulates most of the functionality of Media Foundation. It consists of the media foundation pipeline, which has three components: Media Source , Media Sink and Media Foundation Transforms (MFT). A media source is an object that acts as the source of multimedia data, either compressed or uncompressed. It can encapsulate various data sources, like

3672-548: Is done internally and the application has little control over it. The Source Reader and Sink Writer provide ease of use and the Pipeline Architecture offers extremely sophisticated control over the flow of the media data. However, many of the components available to a Pipeline (such as the Enhanced Video Renderer) are simply not readily usable in a Reader-Writer architecture application. Since

3780-649: Is included in Windows XP SP2 and newer. This version uses Direct3D 9 instead of DirectDraw, allowing developers to transform video images using the Direct3D pixel shaders. It is available for all Windows platforms as part of the DirectX 9 redistributable. As VMR-7 it provides a Windowless Mode. However, unlike Overlay mixer or VMR-7 it does not support video ports. Using the /3GB boot option may cause VMR-9 to fail. Windows Vista and Windows 7 ship with

3888-421: Is intended to be replaced by Media Foundation step-by-step, starting with a few features. For some time there will be a co-existence of Media Foundation and DirectShow. Media Foundation will not be available for previous Windows versions, including Windows XP . The first release, present in Windows Vista , focuses on audio and video playback quality, high-definition content (i.e. HDTV ), content protection and

3996-540: Is no longer covered by any essential patents and can thus be used without obtaining a licence or paying any fees. The ISO patent database lists one patent for ISO 11172, US 4,472,747, which expired in 2003. The near-complete draft of the MPEG-1 standard was publicly available as ISO CD 11172 by December 6, 1991. Neither the July 2008 Kuro5hin article "Patent Status of MPEG-1, H.261 and MPEG-2", nor an August 2008 thread on

4104-524: Is one Cb block of 8x8 and one Cr block of 8x8. This set of 6 blocks, with a picture resolution of 16×16, is processed together and called a macroblock . All of these 8x8 blocks are independently put through DCT and quantization. A macroblock is the smallest independent unit of (color) video. Motion vectors (see below) operate solely at the macroblock level. If the height or width of the video are not exact multiples of 16, full rows and full columns of macroblocks must still be encoded and decoded to fill out

4212-484: Is only possible to the nearest I-frame. When cutting a video it is not possible to start playback of a segment of video before the first I-frame in the segment (at least not without computationally intensive re-encoding). For this reason, I-frame-only MPEG videos are used in editing applications. I-frame only compression is very fast, but produces very large file sizes: a factor of 3× (or more) larger than normally encoded MPEG-1 video, depending on how temporally complex

4320-513: Is required for loading a URL instead of a local file on disk – DirectShow's filter graph abstracts these details from the programmer, although recent developments in QuickTime (including an ActiveX control ) have reduced this disparity. DirectShow Editing Services (DES), introduced in DirectX 8.0/ Windows XP is an API targeted at video editing tasks and built on top of the core DirectShow architecture. DirectShow Editing Services

4428-472: Is the Media Session that manages the flow of the media data through the Pipeline and that Pipeline can have multiple forks and branches. An MF application can get access to the media data as it traverses from a Media Source to a Media Sink by implementing a custom Media Transform component and inserting it in an appropriate location in the Pipeline. The Reader-Writer Architecture uses a component called

DirectShow - Misplaced Pages Continue

4536-490: Is the recipient of processed multimedia data. A media sink can either be a renderer sink , which renders the content on an output device, or an archive sink , which saves the content onto a persistent storage system such as a file. A renderer sink takes uncompressed data as input whereas an archive sink can take either compressed or uncompressed data, depending on the output type. The data from media sources to sinks are acted upon by MFTs; MFTs are certain functions which transform

4644-434: Is unavailable, the developer must create a custom filter before a filter graph can be built. Unlike the main C API of QuickTime where it is necessary to call MoviesTask in a loop to load a media file, DirectShow handles all of this in a transparent way. It creates several background threads that smoothly play the requested file or URL without much work required from the programmer. Also in contrast to QuickTime, nothing special

4752-466: Is using to render the media file. Codec hell can be resolved by manually building filter graphs, using a media player that supports ignoring or overriding filter merits, or by using a filter manager that changes filter merits in the Windows Registry . DirectShow, being a developer-centric framework and API, does not directly offer end-user control over encoding content, nor does it incorporate

4860-410: Is usually easier to stop the graph and create a new graph from scratch. Starting with DirectShow 8.0, dynamic graph building, dynamic reconnection, and filter chains were introduced to help alter the graph while it was running. However, many filter vendors ignore this feature, making graph modification problematic after a graph has begun processing. Although DirectShow is capable of dynamically building

4968-489: The anchor frame ). The difference between a P-frame and its anchor frame is calculated using motion vectors on each macroblock of the frame (see below). Such motion vector data will be embedded in the P-frame for use by the decoder. A P-frame can contain any number of intra-coded blocks (DCT and Quantized), in addition to any forward-predicted blocks (Motion Vectors). If a video drastically changes from one frame to

5076-471: The DWM and other applications using the device. The frame rate of the output video is synchronized with the frame rate of the reference stream. If any of the other streams (called substreams ) have a different frame rate, EVR discards the extra frames (if the substream has a higher frame rate), or uses the same frame more than once (if it has a lower frame rate). Windows Media Audio and Windows Media Video are

5184-533: The DirectX 5 suite of technologies, and around July started referring to it as DirectShow, reflecting Microsoft's efforts at the time to consolidate technologies that worked directly with hardware under a common naming scheme. DirectShow became a standard component of all Windows operating systems starting with Windows 98 ; however it is available on Windows 95 by installing the latest available DirectX redistributable. In DirectX version 8.0, DirectShow became part of

5292-674: The Joint Photographic Experts Group and CCITT 's Experts Group on Telephony (creators of the JPEG image compression standard and the H.261 standard for video conferencing respectively), the Moving Picture Experts Group (MPEG) working group was established in January 1988, by the initiative of Hiroshi Yasuda ( Nippon Telegraph and Telephone ) and Leonardo Chiariglione ( CSELT ). MPEG

5400-618: The MP3 article. All patents in the world connected to MP3 expired 30 December 2017, which makes this format totally free for use. On 23 April 2017, Fraunhofer IIS stopped charging for Technicolor's MP3 licensing program for certain MP3 related patents and software. The following corporations filed declarations with ISO saying they held patents for the MPEG-1 Video (ISO/IEC-11172-2) format, although all such patents have since expired. Part 1 of

5508-530: The Windows SDK . It includes several new enhancements, codecs and filter updates such as the Enhanced Video Renderer (EVR) and DXVA 2.0 ( DirectX Video Acceleration ). DirectShow divides a complex multimedia task (e.g. video playback) into a sequence of fundamental processing steps known as filters . Each filter – which represents one stage in the processing of the data – has input and/or output pins that may be used to connect

SECTION 50

#1732798489391

5616-583: The 58th Annual Technology & Engineering Emmy Awards . Commanding DirectShow to play a file is a relatively simple task. However, while programming more advanced customizations, such as commanding DirectShow to display certain windows messages from the video window or creating custom filters, many developers complain of difficulties. It is regarded as one of Microsoft's most complex development libraries/APIs. Developers rarely create DirectShow filters from scratch. Rather, they employ DirectShow Base Classes. The Base Classes can often simplify development, allowing

5724-518: The EVR renderer, in sequence. Or for a video capture application, the camcorder will act as video and audio sources, on which codec MFTs will work to compress the data and feed to a multiplexer that coalesces the streams into a container; and finally a file sink or a network sink will write it to a file or stream over a network. The application also has to co-ordinate the flow of data between the pipeline components. The control layer has to "pull" (request) samples from one pipeline component and pass it onto

5832-504: The Enhanced Video Renderer (EVR) for rendering video content, which acts as a mixer as well. It can mix up to 16 simultaneous streams, with the first stream being a reference stream . All but the reference stream can have per-pixel transparency information, as well as any specified z-order . The reference stream cannot have transparent pixels, and has a fixed z-order position, at the back of all streams. The final image

5940-526: The MPEG-1 standard covers systems , and is defined in ISO/IEC-11172-1. MPEG-1 Systems specifies the logical layout and methods used to store the encoded audio, video, and other data into a standard bitstream, and to maintain synchronization between the different contents. This file format is specifically designed for storage on media, and transmission over communication channels , that are considered relatively reliable. Only limited error protection

6048-640: The MPEG-1 standard very strictly defines the bitstream , and decoder function, but does not define how MPEG-1 encoding is to be performed, although a reference implementation is provided in ISO/IEC-11172-5. This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors. The first three parts (Systems, Video and Audio) of ISO/IEC 11172 were published in August 1993. Due to its age, MPEG-1

6156-527: The Media Foundation article, Windows Vista and Windows 7 applications use Media Foundation instead of DirectShow for several media related tasks. The direct predecessor of DirectShow, ActiveMovie (codenamed Quartz), was designed to provide MPEG-1 support for Windows. It was also intended as a future replacement for media processing frameworks like Video for Windows and the Media Control Interface , which had never been fully ported to

6264-646: The Media Foundation pipeline, or use them directly as stand-alone objects. MFTs can be any of the following type: Microsoft recommends developers to write a Media Foundation Transform instead of a DirectShow filter, for Windows Vista , Windows 7 & Windows 8 . For video editing and video capture, Microsoft recommends using DirectShow as they are not the primary focus of Media Foundation in Windows Vista. Starting with Windows 7, MFTs also support hardware-accelerated video processing, encoding and decoding for AVStream-based media devices. Media Foundation uses

6372-470: The Media Session and Pipeline while utilizing the ease of use of a Sink Writer. The Sink Writer is not part of the Pipeline and it does not interact with the Media Session. In effect, the media data is processed by a special Media Sink called a Sample Grabber Sink which consumes the media data and hands a copy off to the Sink Writer as it does so. It is also possible to implement a Hybrid Architecture with

6480-455: The Media Session in a Pipeline Architecture application. Since the MF application manages the transmission of the Media Samples between the Source Reader and Sink Writer it will always have access to the raw media data. The Source Reader and Sink Writer components do have a limited ability to automatically load Media Transforms to assist with the conversion of the format of the media data, however, this

6588-400: The Sink Writer by the actions of the application. The application will either take the packets of media data (called Media Samples) from the Source Reader and give them directly them to the Sink Writer or it will set up a callback function on the Source Reader which performs the same operation. In effect, as it manages the data transport, the application itself performs a similar role to that of

SECTION 60

#1732798489391

6696-426: The above example, from left to right, the graph contains a source filter to read an MP3 file, stream splitter and decoder filters to parse and decode the audio, and a rendering filter to play the raw audio samples. Each filter has one or more pins that can be used to connect that filter to other filters. Every pin functions either as an output or input source for data to flow from one filter to another. Depending on

6804-429: The application. Theoretically there is only one Media Foundation architecture and this is the Media Session, Pipeline, Media Source, Transform and Media Sink model. However this architecture can be complex to set up and there is considerable scope for lightweight, relatively easy to configure MF components designed to handle the processing of media data for simple point solutions. Thus practical considerations necessitated

6912-728: The background behind an object is being revealed over several frames, or in fading transitions, such as scene changes. A B-frame can contain any number of intra-coded blocks and forward-predicted blocks, in addition to backwards-predicted, or bidirectionally predicted blocks. MPEG-1 has a unique frame type not found in later video standards. "D-frames" or DC-pictures are independently coded images (intra-frames) that have been encoded using DC transform coefficients only (AC coefficients are removed when encoding D-frames—see DCT below) and hence are very low quality. D-frames are never referenced by I-, P- or B- frames. D-frames are only used for fast previews of video, for instance when seeking through

7020-740: The built-in Media Foundation codecs to play these formats by default. MIDI playback is also not yet supported using Media Foundation. Applications that support Media Foundation include: Any application that uses Protected Media Path in Windows also uses Media Foundation. MPEG-1 MPEG-1 is a standard for lossy compression of video and audio . It is designed to compress VHS -quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, making video CDs , digital cable / satellite TV and digital audio broadcasting (DAB) practical. Today, MPEG-1 has become

7128-1241: The codec support available in Windows Vista. It includes AVI , WAV , AAC/ADTS file sources to read the respective formats, an MPEG-4 file source to read MP4 , M4A, M4V, MP4V, MOV and 3GP container formats and an MPEG-4 file sink to output to MP4 format. Similar to Windows Vista, transcoding (encoding) support is not exposed through any built-in Windows application but several codecs are included as Media Foundation Transforms (MFTs). In addition to Windows Media Audio and Windows Media Video encoders and decoders, and ASF file sink and file source introduced in Windows Vista, Windows 7 includes an H.264 encoder with Baseline profile level 3 and Main profile support and an AAC Low Complexity ( AAC-LC ) profile encoder For playback of various media formats, Windows 7 also introduces an H.264 decoder with Baseline, Main, and High-profile support, up to level 5.1, AAC-LC and HE-AAC v1 ( SBR ) multichannel, HE-AAC v2 ( PS ) stereo decoders, MPEG-4 Part 2 Simple Profile and Advanced Simple Profile decoders which includes decoding popular codec implementations such as DivX , Xvid and Nero Digital as well as MJPEG and DV MFT decoders for AVI. Windows Media Player 12 uses

7236-758: The core DirectX SDK along with other DirectX APIs. In October 2004, DirectShow was removed from the main DirectX distribution and relocated to the DirectX Extras download. In April 2005, DirectShow was removed entirely from DirectX and moved to the Windows SDK starting with the Windows Server 2003 SP1 version of the SDK. The DirectX SDK was, however, still required to build some of the DirectShow samples. Since November 2007, DirectShow APIs are part of

7344-444: The core layer components function asynchronously, and is generally implemented as OS services. Pausing, stopping, fast forward, reverse or time-compression can be achieved by controlling the presentation clock. However, the media pipeline components are not connected; rather they are just presented as discrete components. An application running in the Control layer has to choose which source types, transforms and sinks are needed for

7452-411: The data into another form. MFTs can include multiplexers and demultiplexers, codecs or DSP effects like reverb . The core layer uses services like file access and networking and clock synchronization to time the multimedia rendering. These are part of the Platform layer , which provides services necessary for accessing the source and sink byte streams, presentation clocks and an object model that lets

7560-412: The decoder, with residual difference coding using a discrete cosine transform (DCT) of size 8×8, scalar quantization , and variable-length codes (like Huffman codes ) for entropy coding . H.261 was the first practical video coding standard, and all of its described design elements were also used in MPEG-1. Modeled on the successful collaborative approach and the compression technologies developed by

7668-646: The filter graph searches the Windows Registry for registered filters and builds its graph of filters based on the locations provided. After this, it connects the filters together, and, at the developer's request, executes (i.e., plays, pauses, etc.) the created graph. DirectShow filter graphs are widely used in video playback (in which the filters implement functions such as file parsing, video and audio demultiplexing, decompressing and rendering) as well as for video and audio recording, editing, encoding, transcoding or network transmission of media. Interactive tasks such as DVD navigation may also be controlled by DirectShow. In

7776-418: The filter to other filters. The generic nature of this connection mechanism enables filters to be connected in various ways so as to implement different complex functions. To implement a specific complex task, a developer must first build a filter graph by creating instances of the required filters, and then connecting the filters together. There are three main types of filters: During the rendering process,

7884-563: The filter, data is either "pulled" from an input pin or "pushed" to an output pin in order to transfer data between filters. Each pin can only connect to one other pin and they have to agree on what kind of data they are sending. Most filters are built using a set of C++ classes provided in the DirectShow SDK, called the DirectShow Base Classes. These handle much of the creation, registration and connection logic for

7992-406: The filter. For the filter graph to use filters automatically, they need to be registered in a separate DirectShow registry entry as well as being registered with COM. This registration can be managed by the DirectShow Base Classes. However, if the application adds the filters manually, they do not need to be registered at all. Unfortunately, it is difficult to modify a graph that is already running. It

8100-646: The final standard (for parts 1–3) was approved in early November 1992 and published a few months later. The reported completion date of the MPEG-1 standard varies greatly: a largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced. The draft standard was publicly available for purchase. The standard was finished with the 6 November 1992 meeting. The Berkeley Plateau Multimedia Research Group developed an MPEG-1 decoder in November 1992. In July 1990, before

8208-465: The first draft of the MPEG-1 standard had even been written, work began on a second standard, MPEG-2 , intended to extend MPEG-1 technology to provide full broadcast-quality video (as per CCIR 601 ) at high bitrates (3–15  Mbit/s) and support for interlaced video. Due in part to the similarity between the two codecs, the MPEG-2 standard includes full backwards compatibility with MPEG-1 video, so any MPEG-2 decoder can play MPEG-1 videos. Notably,

8316-525: The following five Parts : The predecessor of MPEG-1 for video coding was the H.261 standard produced by the CCITT (now known as the ITU-T ). The basic architecture established in H.261 was the motion-compensated DCT hybrid video coding structure. It uses macroblocks of size 16×16 with block-based motion estimation in the encoder and motion compensation using encoder-selected motion vectors in

8424-524: The gstreamer-devel mailing list were able to list a single unexpired MPEG-1 Video and MPEG-1 Audio Layer I/II patent. A May 2009 discussion on the whatwg mailing list mentioned US 5,214,678 patent as possibly covering MPEG-1 Audio Layer II. Filed in 1990 and published in 1993, this patent is now expired. A full MPEG-1 decoder and encoder, with "Layer III audio", could not be implemented royalty free since there were companies that required patent fees for implementations of MPEG-1 Audio Layer III, as discussed in

8532-484: The human eye is much more sensitive to small changes in brightness (the Y component) than in color (the Cr and Cb components), chroma subsampling is a very effective way to reduce the amount of video data that needs to be compressed. However, on videos with fine detail (high spatial complexity ) this can manifest as chroma aliasing artifacts. Compared to other digital compression artifacts , this issue seems to very rarely be

8640-508: The implementation of variations on the fundamental Pipeline design and components such as the Source Reader and Sink Writer which operate outside the Pipeline model were developed. Some sources split the Media Foundation architecture into three general classes. The Pipeline Architecture is distinguished by the use of a distinct Media Session object and Pipeline. The media data flows from one or more Media Sources to one or more Media Sinks and, optionally, through zero or more Media Transforms. It

8748-413: The licensing fees to the involved codec technology developer or patent holder. Finally, there are "bridge" filters that simultaneously support multiple formats, as well as functions like stream multiplexing, by exposing the functionality of underlying multimedia APIs such as VLC . The amount of work required to implement a filter graph depends on several factors. In the simplest case, DirectShow can create

8856-419: The loader, which then creates the necessary connections between the components. The media session object manages the job of synchronizing with the presentation clock. It creates the presentation clock object, and passes a reference to it to the sink. It then uses the timer events from the clock to propagate data along the pipeline. It also changes the state of the clock to handle pause, stop or resume requests from

8964-464: The middle of the video to reset to zero, which then begin incrementing again. Such PTS wraparound disparities can cause timing issues that must be specially handled by the decoder. Decoding Time Stamps (DTS), additionally, are required because of B-frames. With B-frames in the video stream, adjacent frames have to be encoded and decoded out-of-order (re-ordered frames). DTS is quite similar to PTS, but instead of just handling sequential frames, it contains

9072-461: The most widely compatible lossy audio/video format in the world, and is used in a large number of products and technologies. Perhaps the best-known part of the MPEG-1 standard is the first version of the MP3 audio format it introduced. The MPEG-1 standard is published as ISO / IEC 11172 , titled Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s . The standard consists of

9180-414: The next (such as a cut ), it is more efficient to encode it as an I-frame. "B-frame" stands for "bidirectional-frame" or "bipredictive frame". They may also be known as backwards-predicted frames or B-pictures. B-frames are quite similar to P-frames, except they can make predictions using both the previous and future frames (i.e. two anchor frames). It is therefore necessary for the player to first decode

9288-593: The next I- or P- anchor frame sequentially after the B-frame, before the B-frame can be decoded and displayed. This means decoding B-frames requires larger data buffers and causes an increased delay on both decoding and during encoding. This also necessitates the decoding time stamps (DTS) feature in the container/system stream (see above). As such, B-frames have long been subject of much controversy, they are often avoided in videos, and are sometimes not fully supported by hardware decoders. No other frames are predicted from

9396-485: The next component in order to achieve data flow within the pipeline. This is in contrast to DirectShow's "push" model where a pipeline component pushes data to the next component. Media Foundation allows content protection by hosting the pipeline within a protected execution environment, called the Protected Media Path . The control layer components are required to propagate the data through the pipeline at

9504-428: The object that controls the pipeline session, and descriptive metadata is exposed for the application to use if it chooses to. Media Foundation provides a Media Session object that can be used to set up the topologies, and facilitate a data flow, without the application doing it explicitly. It exists in the control layer, and exposes a Topology loader object. The application specifies the required pipeline topology to

9612-426: The only default supported formats for encoding through Media Foundation in Windows Vista . For decoding, an MP3 file source is available in Windows Vista to read MP3 streams but an MP3 file sink to output MP3 is only available in Windows 7. Format support is extensible however; developers can add support for other formats by writing encoder/decoder MFTs and/or custom media sources/media sinks. Windows 7 expands upon

9720-464: The original API is in C++, DirectShow Editing Services is accessible in any Microsoft .NET compatible language including Microsoft Visual C# and Microsoft Visual Basic by using a third-party code library called "DirectShowNet Library". Alternatively, the entire DirectShow API, including DirectShow Editing Services, can be accessed from Borland Delphi 5, 6 and 7, C++ Builder 6, and from later versions with

9828-405: The other simultaneous stream (e.g. video). The MPEG Video Buffering Verifier (VBV) assists in determining if a multiplexed PS can be decoded by a device with a specified data throughput rate and buffer size. This offers feedback to the multiplexer and the encoder, so that they can change the multiplex size or adjust bitrates as needed for compliance. Part 2 of the MPEG-1 standard covers video and

9936-454: The particular video processing task at hand, and set up the "connections" between the components (a topology ) to complete the data flow pipeline. For example, to play back a compressed audio/video file, the pipeline will consist of a file source object, a demultiplexer for the specific file container format to split the audio and video streams, codecs to decompress the audio and video streams, DSP processors for audio and video effects and finally

10044-407: The picture (though the extra decoded pixels are not displayed). To decrease the amount of temporal redundancy in a video, only blocks that change are updated, (up to the maximum GOP size). This is known as conditional replenishment. However, this is not very effective by itself. Movement of the objects, and/or the camera may result in large portions of the frame needing to be updated, even though only

10152-585: The programmer to bypass certain tasks. However, the process may remain relatively complex; the code found in the Base Classes is nearly half the size of the entire MFC library . As a result, even with the Base Classes, the number of COM objects that DirectShow contains often overwhelms developers. In some cases, DirectShow's API deviates from traditional COM rules, particularly with regard to the parameters used for methods . To overcome their difficulties with DirectShow's unique COM rules, developers often turn to

10260-408: The proper time-stamps to tell the decoder when to decode and display the next B-frame (types of frames explained below), ahead of its anchor (P- or I-) frame. Without B-frames in the video, PTS and DTS values are identical. To generate the PS, the multiplexer will interleave the (two or more) packetized elementary streams. This is done so the packets of the simultaneous streams can be transferred over

10368-402: The resulting filter graph is variable. It is possible for filter graphs to change over time as new filters are installed on the computer. Codec hell (a term derived from DLL hell ) is when multiple DirectShow filters conflict for performing the same task. A large number of companies now develop codecs in the form of DirectShow filters, resulting in the presence of several filters that can decode

10476-495: The right codec every time. It wasn't really designed for a competing merit nuclear arms race." A tool to help in the troubleshooting of "codec hell" issues usually referenced is the GSpot Codec Information Appliance, which can be useful in determining what codec is used to render video files in AVI and other containers. GraphEdit can also help understanding the sequence of filters that DirectShow

10584-479: The same channel and are guaranteed to both arrive at the decoder at precisely the same time. This is a case of time-division multiplexing . Determining how much data from each stream should be in each interleaved segment (the size of the interleave) is complicated, yet an important requirement. Improper interleaving will result in buffer underflows or overflows, as the receiver gets more of one stream than it can store (e.g. audio), before it gets enough data to decode

10692-433: The same media type. This issue is further exacerbated by DirectShow's merit system, where filter implementations end up competing with one another by registering themselves with increasingly elevated priority. Microsoft's Ted Youmans explained that "DirectShow was based on the merit system, with the idea being that, using a combination of the filter’s merit and how specific the media type/sub type is, one could reasonably pick

10800-422: The structure of a Media Sample produced by a Source Reader is identical to that output by a Media Source it is possible to set up a Pipeline Architecture in which the Media Samples are intercepted as they pass through the Pipeline and a copy is given to a Media Sink. This is known as a Hybrid Architecture and it makes it possible to have an application which takes advantage of the sophisticated processing abilities of

10908-507: The time. MPEG-1 has several frame/picture types that serve different purposes. The most important, yet simplest, is I-frame . "I-frame" is an abbreviation for " Intra-frame ", so-called because they can be decoded independently of any other frames. They may also be known as I-pictures, or keyframes due to their somewhat similar function to the key frames used in animation. I-frames can be considered effectively identical to baseline JPEG images. High-speed seeking through an MPEG-1 video

11016-441: The video and did not have the option to use GDI drawing. The main new feature of VMR-7 was the ability to mix multiple streams and graphics with alpha blending, allowing applications to draw text and graphics over the video and support custom effects. It also featured a "windowless mode" (access to the composited image before it is rendered) which fixed the problems with access to the window handle. DirectX 9 introduced VMR-9 , which

11124-550: The video content). For this reason, D-frames are seldom actually used in MPEG-1 video encoding, and the D-frame feature has not been included in any later video coding standards. MPEG-1 operates on video in a series of 8×8 blocks for quantization. However, to reduce the bit rate needed for motion vectors and because chroma (color) is subsampled by a factor of 4, each pair of (red and blue) chroma blocks corresponds to 4 different luma blocks. That is, for 4 luma blocks of size 8x8, there

11232-556: The video. DirectShow 6.0, released as part of DirectX Media introduced the Overlay Mixer renderer designed for DVD playback and broadcast video streams with closed captioning and subtitles . The Overlay Mixer uses DirectDraw 5 for rendering. Downstream connection with the Video Renderer is required for window management. Overlay Mixer also supports Video Port Extensions (VPE), enabling it to work with analog TV tuners with overlay capability (sending video directly to

11340-403: The visibility of the video window and the video card's capabilities). It had limited access to the video window. Video for Windows had been plagued with deadlocks caused by applications' incorrect handling of the video windows, so in early DirectShow releases, the handle to the playback window was hidden from applications. There was also no reliable way to draw caption text or graphics on top of

11448-405: Was chosen for transmission over T-1 / E-1 lines and as the approximate data rate of audio CDs . The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated in the process. After 20 meetings of the full group in various cities around the world, and 4½ years of development and testing,

11556-575: Was formed to address the need for standard video and audio formats, and to build on H.261 to get better quality through the use of somewhat more complex encoding methods (e.g., supporting higher precision for motion vectors). Development of the MPEG-1 standard began in May 1988. Fourteen video and fourteen audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at data rates of 1.5 Mbit/s. This specific bitrate

11664-590: Was introduced for Microsoft's Windows Movie Maker . It includes APIs for timeline and switching services, resizing, cropping, video and audio effects, as well as transitions, keying , automatic frame rate and sample rate conversion and such other features which are used in non-linear video editing allowing creation of composite media out of a number of source audio and video streams. DirectShow Editing Services allow higher-level run-time compositing, seeking support, and graph management, while still allowing applications to access lower-level DirectShow functions. While

#390609