Misplaced Pages

Unified shader model

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In the field of 3D computer graphics , the unified shader model (known in Direct3D 10 as " Shader Model 4.0 ") refers to a form of shader hardware in a graphical processing unit (GPU) where all of the shader stages in the rendering pipeline (geometry, vertex, pixel, etc.) have the same capabilities. They can all read textures and buffers, and they use instruction sets that are almost identical.

#400599

71-465: Earlier GPUs generally included two types of shader hardware, with the vertex shaders having considerably more instructions than the simpler pixel shaders . This lowered the cost of implementation of the GPU as a whole, and allowed more shaders in total on a single unit. This was at the cost of making the system less flexible, and sometimes leaving one set of shaders idle if the workload used one more than

142-436: A Shader Model standard, to help rank the various features of graphic cards into a simple Shader Model version number (1.0, 2.0, 3.0, etc.). Pre-DirectX 9 video cards only supported paletted or integer color types. Sometimes another alpha value is added, to be used for transparency. Common formats are: For early fixed-function or limited programmability graphics (i.e., up to and including DirectX 8.1-compliant GPUs) this

213-626: A functionally complete set of logic operators. In 1987, Conway's Game of Life became one of the first examples of general-purpose computing using an early stream processor called a blitter to invoke a special sequence of logical operations on bit vectors. General-purpose computing on GPUs became more practical and popular after about 2001, with the advent of both programmable shaders and floating point support on graphics processors. Notably, problems involving matrices and/or vectors  – especially two-, three-, or four-dimensional vectors – were easy to translate to

284-462: A shader is a computer program that calculates the appropriate levels of light , darkness , and color during the rendering of a 3D scene —a process known as shading . Shaders have evolved to perform a variety of specialized functions in computer graphics special effects and video post-processing , as well as general-purpose computing on graphics processing units . Traditional shaders calculate rendering effects on graphics hardware with

355-435: A GPU, which acts with native speed and support on those types. A significant milestone for GPGPU was the year 2003 when two research groups independently discovered GPU-based approaches for the solution of general linear algebra problems on GPUs that ran faster than on CPUs. These early efforts to use GPUs as general-purpose processors required reformulating computational problems in terms of graphics primitives, as supported by

426-471: A better approximation of a curve. As of OpenGL 4.0 and Direct3D 11, a new shader class called a tessellation shader has been added. It adds two new shader stages to the traditional model: tessellation control shaders (also known as hull shaders) and tessellation evaluation shaders (also known as Domain Shaders), which together allow for simpler meshes to be subdivided into finer meshes at run-time according to

497-533: A camera or a computer graphics program back to the main program on the CPU, so that the CPU can then make adjustments to the overall screen view. A more advanced example might use edge detection to return both numerical information and a processed image representing outlines to a computer vision program controlling, say, a mobile robot. Because the GPU has fast and local hardware access to every pixel or other picture element in an image, it can analyze and average it (for

568-426: A color value; more complex shaders with multiple inputs/outputs are also possible. Pixel shaders range from simply always outputting the same color, to applying a lighting value, to doing bump mapping , shadows , specular highlights , translucency and other phenomena. They can alter the depth of the fragment (for Z-buffering ), or output more than one color if multiple render targets are active. In 3D graphics,

639-403: A final rendered image can be altered using algorithms defined in a shader, and can be modified by external variables or textures introduced by the computer program calling the shader. Shaders are used widely in cinema post-processing , computer-generated imagery , and video games to produce a range of effects. Beyond simple lighting models, more complex uses of shaders include: altering

710-565: A geometry shader if present, or the rasterizer . Vertex shaders can enable powerful control over the details of position, movement, lighting, and color in any scene involving 3D models . Geometry shaders were introduced in Direct3D 10 and OpenGL 3.2; formerly available in OpenGL 2.0+ with the use of extensions. This type of shader can generate new graphics primitives , such as points, lines, and triangles, from those primitives that were sent to

781-471: A geometry shader include point sprite generation, geometry tessellation , shadow volume extrusion, and single pass rendering to a cube map . A typical real-world example of the benefits of geometry shaders would be automatic mesh complexity modification. A series of line strips representing control points for a curve are passed to the geometry shader and depending on the complexity required the shader can automatically generate extra lines each of which provides

SECTION 10

#1732772794401

852-563: A high degree of flexibility. Most shaders are coded for (and run on) a graphics processing unit (GPU), though this is not a strict requirement. Shading languages are used to program the GPU's rendering pipeline , which has mostly superseded the fixed-function pipeline of the past that only allowed for common geometry transforming and pixel-shading functions; with shaders, customized effects can be used. The position and color ( hue , saturation , brightness , and contrast ) of all pixels , vertices , and/or textures used to construct

923-418: A mathematical function. The function can be related to a variety of variables, most notably the distance from the viewing camera to allow active level-of-detail scaling. This allows objects close to the camera to have fine detail, while further away ones can have more coarse meshes, yet seem comparable in quality. It also can drastically reduce required mesh bandwidth by allowing meshes to be refined once inside

994-479: A per-vertex basis. Newer geometry shaders can generate new vertices from within the shader. Tessellation shaders are the newest 3D shaders; they act on batches of vertices all at once to add detail—such as subdividing a model into smaller groups of triangles or other primitives at runtime, to improve things like curves and bumps , or change other attributes. Vertex shaders are the most established and common kind of 3D shader and are run once for each vertex given to

1065-647: A pixel shader alone cannot produce some kinds of complex effects because it operates only on a single fragment, without knowledge of a scene's geometry (i.e. vertex data). However, pixel shaders do have knowledge of the screen coordinate being drawn, and can sample the screen and nearby pixels if the contents of the entire screen are passed as a texture to the shader. This technique can enable a wide variety of two-dimensional postprocessing effects such as blur , or edge detection /enhancement for cartoon/cel shaders . Pixel shaders may also be applied in intermediate stages to any two-dimensional images— sprites or textures —in

1136-537: A single-source domain specific embedded language based on pure C++11. The dominant proprietary framework is Nvidia CUDA . Nvidia launched CUDA in 2006, a software development kit (SDK) and application programming interface (API) that allows using the programming language C to code algorithms for execution on GeForce 8 series and later GPUs. ROCm , launched in 2016, is AMD's open-source response to CUDA. It is, as of 2022, on par with CUDA with regards to features, and still lacking in consumer support. OpenVIDIA

1207-489: A specific high-use algorithm . GPGPU pipelines may improve efficiency on especially large data sets and/or data containing 2D or 3D imagery. It is used in complex graphics pipelines as well as scientific computing ; more so in fields with large data sets like genome mapping , or where two- or three-dimensional analysis is useful – especially at present biomolecule analysis, protein study, and other complex organic chemistry . An example of such applications

1278-480: A stream at once. A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them. For each element we can only read from the input, perform operations on it, and write to

1349-529: A unified architecture is most sensible when designing hardware intended to support an API offering a unified shader model. OpenGL 3.3 (which offers a unified shader model) can still be implemented on hardware that does not have unified shader architecture. Similarly, hardware that supported non unified shader model APIs could be based on a unified shader architecture, as is the case with Xenos graphics chip in Xbox 360 , for example. The unified shader architecture

1420-781: A variety of computational resources available on the GPU: In fact, a program can substitute a write only texture for output instead of the framebuffer. This is done either through Render to Texture (RTT), Render-To-Backbuffer-Copy-To-Texture (RTBCTT), or the more recent stream-out. The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on. Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by

1491-434: A vertex, while pixel shaders describe the traits (color, z-depth and alpha value) of a pixel. A vertex shader is called for each vertex in a primitive (possibly after tessellation ); thus one vertex in, one (updated) vertex out. Each vertex is then rendered as a series of pixels onto a surface (block of memory) that will eventually be sent to the screen. Shaders replace a section of the graphics hardware typically called

SECTION 20

#1732772794401

1562-483: Is NVIDIA software suite for genome analysis . Such pipelines can also vastly improve efficiency in image processing and computer vision , among other fields; as well as parallel processing generally. Some very heavily optimized pipelines have yielded speed increases of several hundred times the original CPU-based pipeline on one high-use task. A simple example would be a GPU program that collects data about average lighting values as it renders some view from either

1633-440: Is a hardware design by which all shader processing units of a piece of graphics hardware are capable of handling any type of shading tasks. Most often Unified Shading Architecture hardware is composed of an array of computing units and some form of dynamic scheduling / load balancing system that ensures that all of the computational units are kept working as often as possible. Unified shader architecture allows more flexible use of

1704-479: Is a large pixel matrix or " frame buffer ". There are three types of shaders in common use (pixel, vertex, and geometry shaders), with several more recently added. While older graphics cards utilize separate processing units for each shader type, newer cards feature unified shaders which are capable of executing any type of shader. This allows graphics cards to make more efficient use of processing power. 2D shaders act on digital images , also called textures in

1775-465: Is available on may Android devices, but is not officially supported by Android . Apple introduced the proprietary Metal API for iOS applications, able to execute arbitrary code through Apple's GPU compute shaders . Computer video cards are produced by various vendors, such as Nvidia , AMD . Cards from such vendors differ on implementing data-format support, such as integer and floating-point formats (32-bit and 64-bit). Microsoft introduced

1846-566: Is no longer necessary to map the computation to graphics primitives. The stream processing nature of GPUs remains valid regardless of the APIs used. (See e.g., ) GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running one kernel on many records in

1917-471: Is the dominant open general-purpose GPU computing language, and is an open standard defined by the Khronos Group . OpenCL provides a cross-platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is actively supported on Intel, AMD, Nvidia, and ARM platforms. The Khronos Group has also standardised and implemented SYCL , a higher-level programming model for OpenCL as

1988-405: Is the world's first GPU microarchitecture that supports mesh shading through DirectX 12 Ultimate API, several months before Ampere RTX 30 series was released. In 2020, AMD and Nvidia released RDNA 2 and Ampere microarchitectures which both support mesh shading through DirectX 12 Ultimate . These mesh shaders allow the GPU to handle more complex algorithms, offloading more work from the CPU to

2059-399: Is useful in graphics because almost every basic data type is a vector (either 2-, 3-, or 4-dimensional). Examples include vertices, colors, normal vectors, and texture coordinates. Many other applications can put this to good use, and because of their higher performance, vector instructions, termed single instruction, multiple data ( SIMD ), have long been available on CPUs. Originally, data

2130-590: The DirectCompute GPU computing API, released with the DirectX 11 API. Alea GPU , created by QuantAlea, introduces native GPU computing capabilities for the Microsoft .NET languages F# and C# . Alea GPU also provides a simplified GPU programming model based on GPU parallel-for and parallel aggregate using delegates and automatic memory management. MATLAB supports GPGPU acceleration using

2201-650: The Kepler GPU has 1.5 MiB last-level cache, the Maxwell GPU has 2 MiB last-level cache, and the Pascal GPU has 4 MiB last-level cache. GPUs have very large register files , which allow them to reduce context-switching latency. Register file size is also increasing over different GPU generations, e.g., the total register file size on Maxwell (GM200), Pascal and Volta GPUs are 6 MiB, 14 MiB and 20 MiB, respectively. By comparison,

Unified shader model - Misplaced Pages Continue

2272-456: The Metal framework . Modern video game development platforms such as Unity , Unreal Engine and Godot increasingly include node-based editors that can create shaders without the need for actual code; the user is instead presented with a directed graph of connected nodes that allow users to direct various textures, maps, and mathematical functions into output values like the diffuse color,

2343-537: The Parallel Computing Toolbox and MATLAB Distributed Computing Server , and third-party packages like Jacket . GPGPU processing is also used to simulate Newtonian physics by physics engines , and commercial implementations include Havok Physics, FX and PhysX , both of which are typically used for computer and video games . C++ Accelerated Massive Parallelism ( C++ AMP ) is a library that accelerates execution of C++ code by exploiting

2414-441: The central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing. Essentially, a GPGPU pipeline is a kind of parallel processing between one or more GPUs and CPUs that analyzes data as if it were in image or other graphic form. While GPUs operate at lower frequencies, they typically have many times

2485-427: The hue , saturation , brightness ( HSL/HSV ) or contrast of an image; producing blur , light bloom , volumetric lighting , normal mapping (for depth effects), bokeh , cel shading , posterization , bump mapping , distortion , chroma keying (for so-called "bluescreen/ greenscreen " effects), edge and motion detection , as well as psychedelic effects such as those seen in the demoscene . This use of

2556-433: The pipeline , whereas vertex shaders always require a 3D scene. For instance, a pixel shader is the only kind of shader that can act as a postprocessor or filter for a video stream after it has been rasterized . 3D shaders act on 3D models or other geometry but may also access the colors and textures used to draw the model or mesh . Vertex shaders are the oldest type of 3D shader, generally making modifications on

2627-466: The DirectX 9 specification. DirectX 9 Shader Model 2.x suggested the support of two precision types: full and partial precision. Full precision support could either be FP32 or FP24 (floating point 32- or 24-bit per component) or greater, while partial precision was FP16. ATI's Radeon R300 series of GPUs supported FP24 precision only in the programmable fragment pipeline (although FP32 was supported in

2698-487: The Fixed Function Pipeline (FFP), so-called because it performs lighting and texture mapping in a hard-coded manner. Shaders provide a programmable alternative to this hard-coded approach. The basic graphics pipeline is as follows: The graphic pipeline uses these steps in order to transform three-dimensional (or two-dimensional) data into useful two-dimensional data for displaying. In general, this

2769-564: The GPU because of this. Compute kernels can be thought of as the body of loops . For example, a programmer operating on a grid on the CPU might have code that looks like this: On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing. In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs. Conditional writes could be performed using

2840-429: The GPU has access to every draw operation, it can analyze data in these forms quickly, whereas a CPU must poll every pixel or data element much more slowly, as the speed of access between a CPU and its larger pool of random-access memory (or in an even worse case, a hard drive ) is slower than GPUs and video cards, which typically contain smaller amounts of more expensive memory that is much faster to access. Transferring

2911-795: The GPU, and in algorithm intense rendering, increasing the frame rate of or number of triangles in a scene by an order of magnitude. Intel announced that Intel Arc Alchemist GPUs shipping in Q1 2022 will support mesh shaders. Ray tracing shaders are supported by Microsoft via DirectX Raytracing , by Khronos Group via Vulkan , GLSL , and SPIR-V , by Apple via Metal . Tensor shaders may be integrated in NPUs or GPUs . Tensor shaders are supported by Microsoft via DirectML , by Khronos Group via OpenVX , by Apple via Core ML , by Google via TensorFlow , by Linux Foundation via ONNX . Compute shaders are not limited to graphics applications, but use

Unified shader model - Misplaced Pages Continue

2982-421: The beginning of the graphics pipeline . Geometry shader programs are executed after vertex shaders. They take as input a whole primitive, possibly with adjacency information. For example, when operating on triangles, the three vertices are the geometry shader's input. The shader can then emit zero or more primitives, which are rasterized and their fragments ultimately passed to a pixel shader . Typical uses of

3053-407: The data-parallel hardware on GPUs. Due to a trend of increasing power of mobile GPUs, general-purpose programming became available also on the mobile devices running major mobile operating systems . Google Android 4.2 enabled running RenderScript code on the mobile device GPU. Renderscript has since been deprecated in favour of first OpenGL compute shaders and later Vulkan Compute. OpenCL

3124-408: The differences, approaching unified shader model. Even in the unified model the instruction set may not be completely the same between different shader types; different shader stages may have a few distinctions. Fragment/pixel shaders can compute implicit texture coordinate gradients, while geometry shaders can emit rendering primitives. Unified shader architecture (or unified shading architecture )

3195-484: The earlier GPUs only provided software-managed local memories. However, as GPUs are being increasingly used for general-purpose applications, state-of-the-art GPUs are being designed with hardware-managed multi-level caches which have helped the GPUs to move towards mainstream computing. For example, GeForce 200 series GT200 architecture GPUs did not feature an L2 cache, the Fermi GPU has 768 KiB last-level cache,

3266-561: The efficiency of GPGPU pipelines, which traditionally perform relatively few algorithms on very large amounts of data. Massively parallelized, gigantic-data-level tasks thus may be parallelized even further via specialized setups such as rack computing (many similar, highly tailored machines built into a rack ), which adds a third layer – many computing units each using many CPUs to correspond to many GPUs. Some Bitcoin "miners" used such setups for high-quantity processing. Historically, CPUs have used hardware-managed caches , but

3337-417: The field of computer graphics. They modify attributes of pixels . 2D shaders may take part in rendering 3D geometry . Currently the only type of 2D shader is a pixel shader. Pixel shaders, also known as fragment shaders, compute color and other attributes of each "fragment": a unit of rendering work affecting at most a single output pixel . The simplest kinds of pixel shaders output one screen pixel as

3408-414: The first example) or apply a Sobel edge filter or other convolution filter (for the second) with much greater speed than a CPU, which typically must access slower random-access memory copies of the graphic in question. GPGPU is fundamentally a software concept, not a hardware concept; it is a type of algorithm , not a piece of equipment. Specialized equipment designs may, however, even further enhance

3479-461: The graphics processor. The purpose is to transform each vertex's 3D position in virtual space to the 2D coordinate at which it appears on the screen (as well as a depth value for the Z-buffer). Vertex shaders can manipulate properties such as position, color and texture coordinates, but cannot create new vertices. The output of the vertex shader goes to the next stage in the pipeline, which is either

3550-424: The graphics rendering hardware. For example, in a situation with a heavy geometry workload the system could allocate most computing units to run vertex and geometry shaders. In cases with less vertex workload and heavy pixel load, more computing units could be allocated to run pixel shaders. While unified shader architecture hardware and unified shader model programming interfaces are not a requirement for each other,

3621-463: The hardware can only be used in certain ways. The following discussion referring to vertices, fragments and textures concerns mainly the legacy model of GPGPU programming, where graphics APIs ( OpenGL or DirectX ) were used to perform general-purpose computation. With the introduction of the CUDA (Nvidia, 2007) and OpenCL (vendor-independent, 2008) general-purpose computing APIs, in new GPGPU codes it

SECTION 50

#1732772794401

3692-729: The number of cores . Thus, GPUs can process far more pictures and graphical data per second than a traditional CPU. Migrating data into graphical form and then using the GPU to scan and analyze it can create a large speedup . GPGPU pipelines were developed at the beginning of the 21st century for graphics processing (e.g. for better shaders ). These pipelines were found to fit scientific computing needs well, and have since been developed in this direction. The most known GPGPUs are Nvidia Tesla that are used for Nvidia DGX , alongside AMD Instinct and Intel Gaudi. In principle, any arbitrary Boolean function , including addition, multiplication, and other mathematical functions, can be built up from

3763-681: The other. As improvements in fabrication continued, this distinction became less useful. ATI Technologies introduced a unified architecture on the hardware they developed for the Xbox 360 . Nvidia quickly followed with their Tesla design. AMD introduced a unified shader in card form two years later in the TeraScale line. The concept has been universal since then. Early shader abstractions (such as Shader Model 1.x) used very different instruction sets for vertex and pixel shaders, with vertex shaders having much more flexible instruction set. Later shader models (such as Shader Model 2.x and 3.0) reduced

3834-509: The output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable. Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup. Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements. There are

3905-405: The portion of the data set to be actively analyzed to that GPU memory in the form of textures or other easily readable GPU forms results in speed increase. The distinguishing feature of a GPGPU design is the ability to transfer information bidirectionally back from the GPU to the CPU; generally the data throughput in both directions is ideally high, resulting in a multiplier effect on the speed of

3976-512: The power of shaders. The first video card with a programmable pixel shader was the Nvidia GeForce 3 (NV20), released in 2001. Geometry shaders were introduced with Direct3D 10 and OpenGL 3.2. Eventually, graphics hardware evolved toward a unified shader model . Shaders are simple programs that describe the traits of either a vertex or a pixel . Vertex shaders describe the attributes (position, texture coordinates , colors, etc.) of

4047-404: The same execution resources for GPGPU . They may be used in graphics pipelines e.g. for additional stages in animation or lighting algorithms (e.g. tiled forward rendering ). Some rendering APIs allow compute shaders to easily share data resources with the graphics pipeline. Shaders are written to apply transformations to a large set of elements at a time, for example, to each pixel in an area of

4118-585: The screen, or for every vertex of a model. This is well suited to parallel processing , and most modern GPUs have multiple shader pipelines to facilitate this, vastly improving computation throughput. A programming model with shaders is similar to a higher order function for rendering, taking the shaders as arguments, and providing a specific dataflow between intermediate results, enabling both data parallelism (across pixels, vertices etc.) and pipeline parallelism (between stages). (see also map reduce ). The language in which shaders are programmed depends on

4189-618: The shader units instead of downsampling very complex ones from memory. Some algorithms can upsample any arbitrary mesh, while others allow for "hinting" in meshes to dictate the most characteristic vertices and edges. Circa 2017, the AMD Vega microarchitecture added support for a new shader stage—primitive shaders—somewhat akin to compute shaders with access to the data necessary to process geometry. Nvidia introduced mesh and task shaders with its Turing microarchitecture in 2018 which are also modelled after compute shaders. Nvidia Turing

4260-661: The size of a register file on CPUs is small, typically tens or hundreds of kilobytes. The high performance of GPUs comes at the cost of high power consumption, which under full load is in fact as much power as the rest of the PC system combined. The maximum power consumption of the Pascal series GPU (Tesla P100) was specified to be 250W. GPUs are designed specifically for graphics and thus are very restrictive in operations and programming. Due to their design, GPUs are only effective for problems that can be solved using stream processing and

4331-495: The specular color and intensity, roughness/metalness, height, normal, and so on. Automatic compilation then turns the graph into an actual, compiled shader. General-purpose computing on graphics processing units General-purpose computing on graphics processing units ( GPGPU , or less often GPGP ) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics , to perform computation in applications traditionally handled by

SECTION 60

#1732772794401

4402-434: The speed of a GPU without requiring full and explicit conversion of the data to a graphical form. Mark Harris, the founder of GPGPU.org, coined the term GPGPU . Any language that allows the code running on the CPU to poll a GPU shader for return values, can create a GPGPU framework. Programming standards for parallel computing include OpenCL (vendor-independent), OpenACC , OpenMP and OpenHMPP . As of 2016 , OpenCL

4473-459: The speed tradeoff negates any benefit to offloading the computing onto the GPU in the first place. Most operations on the GPU operate in a vectorized fashion: one operation can be performed on up to four values at once. For example, if one color ⟨R1, G1, B1⟩ is to be modulated by another color ⟨R2, G2, B2⟩ , the GPU can produce the resulting color ⟨R1*R2, G1*G2, B1*B2⟩ in one operation. This functionality

4544-460: The target environment. The official OpenGL and OpenGL ES shading language is OpenGL Shading Language , also known as GLSL, and the official Direct3D shading language is High Level Shader Language , also known as HLSL. Cg , a third-party shading language which outputs both OpenGL and Direct3D shaders, was developed by Nvidia ; however since 2012 it has been deprecated. Apple released its own shading language called Metal Shading Language as part of

4615-472: The term "shader" was introduced to the public by Pixar with version 3.0 of their RenderMan Interface Specification, originally published in May 1988. As graphics processing units evolved, major graphics software libraries such as OpenGL and Direct3D began to support shaders. The first shader-capable GPUs only supported pixel shading , but vertex shaders were quickly introduced once developers realized

4686-573: The two major APIs for graphics processors, OpenGL and DirectX . This cumbersome translation was obviated by the advent of general-purpose programming languages and APIs such as Sh / RapidMind , Brook and Accelerator. These were followed by Nvidia's CUDA , which allowed programmers to ignore the underlying graphical concepts in favor of more common high-performance computing concepts. Newer, hardware-vendor-independent offerings include Microsoft's DirectCompute and Apple/Khronos Group's OpenCL . This means that modern GPGPU pipelines can leverage

4757-720: The vertex processors) while Nvidia 's NV30 series supported both FP16 and FP32; other vendors such as S3 Graphics and XGI supported a mixture of formats up to FP24. The implementations of floating point on Nvidia GPUs are mostly IEEE compliant; however, this is not true across all vendors. This has implications for correctness which are considered important to some scientific applications. While 64-bit floating point values (double precision float) are commonly available on CPUs, these are not universally supported on GPUs. Some GPU architectures sacrifice IEEE compliance, while others lack double-precision. Efforts have occurred to emulate double-precision floating point values on GPUs; however,

4828-411: Was developed at University of Toronto between 2003–2005, in collaboration with Nvidia. Altimesh Hybridizer created by Altimesh compiles Common Intermediate Language to CUDA binaries. It supports generics and virtual functions. Debugging and profiling is integrated with Visual Studio and Nsight. It is available as a Visual Studio extension on Visual Studio Marketplace. Microsoft introduced

4899-539: Was introduced with the Nvidia GeForce 8 series , ATI Radeon HD 2000 series , S3 Chrome 400 , Intel GMA X3000 series, Xbox 360's GPU , Qualcomm Adreno 200 series , Mali Midgard, PowerVR SGX GPUs and is used in all subsequent series. For example, the unified shader is referred as "CUDA core" or "shader core" on NVIDIA GPUs, and is referred as "ALU core" on intel GPUs. Nvidia Intel ATI/AMD Vertex shader In computer graphics ,

4970-402: Was simply passed one-way from a central processing unit (CPU) to a graphics processing unit (GPU), then to a display device . As time progressed, however, it became valuable for GPUs to store at first simple, then complex structures of data to be passed back to the CPU that analyzed an image, or a set of scientific-data represented as a 2D or 3D format that a video card can understand. Because

5041-420: Was sufficient because this is also the representation used in displays. This representation does have certain limitations. Given sufficient graphics processing power even graphics programmers would like to use better formats, such as floating point data formats, to obtain effects such as high-dynamic-range imaging . Many GPGPU applications require floating point accuracy, which came with video cards conforming to

#400599