Saturday, January 07, 2006

Rendering: creation of a 3D graphical object or model done by the GPU.

Texture Mapping: Texture Image, eg a png file, is mapped or applied to a 3D object. Done by the GPU.

Wednesday, January 04, 2006

MEMORY BANDWIDTH:
The amount of data that can be transferred between a graphics chip and its onboard memory. Measured in megabytes per second(MB/s). To calculate memory bandwidth, multiply your memory bus bit-width by the speed of the memory and the number of chunks of data transferred per clock, then divide that number by 8. For example: For 128-bit DDR memory running at 500MHz, you would multiply 128 by 500 by 2, and then divide the product by 8. The result is 16,0000MB/s, or 16GB/s.
source http://static.tigerdirect.com/html/VideoCardGlossary.html#m

DJB
07-29-02, 01:06 PM
Playing today's average game, you will see very little improvement going from 4x to 8x AGP. Most folks don't seem to understand under what circumstances you can expect higher AGP speeds to help you.

People get confused when they see that high resolutions, with lots of AF and FSAA, do not get a performance increase with high AGP speed. That is because those functions are all about fill rate, frame buffer, and local memory bandwidth to access that frame buffer. You video cards brute force ability to render is tested.

high polygon counts and massive textures will benefit from AGP. If you have a very high polygon count, you will send alot of geometry information accross the AGP bus to the T&L engine on the Vid Card. There are games in existance that actually come pretty close to saturating the AGP bus in this fashion, and there are definitely professional applications which do so.

If your textures overflow the local texture memory on your card, you will see a big increase in speed going from 2x to 4x to 8x AGP. This doesn't happen very often. However, there is at least one, and perhaps more than one, game that I play that easily uses a few hundred megs of textures. Memory bandwidth, and AGP bandwidth begin to get inportant when you are constantly swapping 300megs of textures in and out of a 64 meg video card.

I hope that some folks will read this and understand that they are not going to get high 3Dmarks because of AGP 8x, but when you are rendering in the millions of polygons per second, and you are using 512MB of textures, you are going to notice that AGP8x.

It is actually under these sort of conditions that SBA and Fastwrites show their value. Under most conditions SBA and Fastwrites are not worth the headaches, but under sever stress of the AGP bus, they are. :)

Hope this will help folks with their "AGP decisions". :)

DJB
source http://www.nvnews.net/vbulletin/archive/index.php/t-78.html
Technical Background

AGP has several advantages over PCI. It offers a minor data transfer rate advantage when it comes to moving the geometry stream from the CPU to the graphics card. When it comes to managing large texture databases, AGP's GART table allows the OS to manage textures in off screen memory as well as in system memory, and allows the graphics card to access them directly in either location. Prior to AGP, the game developer had three options for methods to manage textures:

Limit the texture database to whatever would fit in off-screen memory only. This usually delivers outstanding frame rates, but memory size constraints can limit artistic creativity. Depending on the graphics card, texture space could be as low as only one megabyte or possibly as high as five or six megabytes.

Use the OS to manage textures in main memory, and require the CPU to copy textures from main memory to graphics memory as needed. This is PAINFULLY slow. In order to make AGP look as good as possible, Intel likes to compare AGP to this mode. This is what DirectX Retained Mode does, but game developers do not generally use this method.

Place the most frequently used textures in off screen memory (like #1 above), then lock down a few additional megabytes of system memory for the remainder of the texture database. If the graphics chip needs a texture that is not in graphics memory, then the accelerator must use PCI master mode (DMA) to copy the needed texture, on demand, into texture swap space in graphics memory. Performance is very good. This has been the preferred approach for game developers, and can be programmed under Direct X Immediate Mode.

AGP is a modified approach to option 3. Memory management is a little more flexible, and the data transfer rate is better because of AGP's faster clock speed and bus pipelining.

AGP offers two ways to deal with textures. One is called DMA mode which operates almost exactly like Option 3 above, but transfers occur over AGP rather than PCI. The other is Execute Mode, which allows the graphics chip to access the texture information in main memory without first copying it to graphics memory. The effective bus throughput of Execute mode and DMA mode are the same. If anything, DMA mode could be faster because of better concurrency and deeper pipelining.

Intel has gone to great lengths to convince game developers that DMA mode stinks. They have even gone so far as to refer to method #2 above as DMA mode in order to confuse everybody. DMA stands for "Direct Memory Access". There is nothing direct about using the CPU to copy data. This is pure deception. True DMA uses a hardware bus master, like a PCI or AGP graphics accelerator.

Game developers and users should prefer AGP's DMA mode because it offers excellent performance while still being architecturally compatible with the installed base of PCI accelerators. Intel prefers Execute Mode because it only runs with AGP. As we all know, Intel is always trying to stir up more ways to persuade users to dump their PCI Pentium 233 systems ASAP, and go buy a more costly P2 AGP system. Intel's Mission is not to "Accelerate 3D Graphics", but rather to "Accelerate Obsolescence".

AGP's DMA mode offers excellent concurrency because the game can detect if it must swap textures before the texture is actually needed, while the CPU is still calculating the geometry (in the geometry setup stage). This way, the graphics card can begin fetching the texture before it is needed to paint pixels on the screen. Concurrency is one of the keys to performance.

In AGP Execute mode, texture accesses are driven by the rasterizer at the final stage of the 3D pipeline. At this point, the accelerator is dead if it cannot have immediate access to textures. In this way, AGP creates a nasty performance bottleneck. Instead of having immediate access to textures in high bandwidth local memory, the accelerator must stall while it arbitrates for access to slower system memory via AGP.
30% Reduction In Accelerator Performance

Intel has developed a software tool which is useful in comparing the performance of AGP vs. Local texturing modes. It is called IBASES. As a matter of principle though, I do not recommend this tool to anyone. IBASES does not support AGP DMA mode texturing. Instead of AGP DMA mode, it substitutes the extremely useless CPU copy mode (method #2 above). Oddly enough, the software still refers to this as "DMA Mode". This is clearly NOT a mistake, but rather a blatant attempt to deceive. For anyone using this tool, results from the "DMA" test should be completely disregarded. One should instead assume that AGP DMA mode results would be about the same as AGP Execute mode.

I evaluated local vs AGP texture performance of the following accelerators:

ATI Rage Pro

3D Labs Permedia 2

nVidia Riva 128

The program allows the user to change the number of times the textures are accessed per frame. This has the effect of gradually increasing the total texture bandwidth demand. As seen over AGP, the texture bandwidth demand created by IBASES in the chart below ranges from 256K per frame at the left extreme, up to 4 megabytes per frame at the right extreme of the chart.



The chart shows the average difference in rendering performance for all of the accelerators using AGP Execute mode texturing, compared to local graphics memory. This data demonstrates that overall, AGP execute mode is about 30% slower than local texture mode.
10% Reduction In CPU Performance

The other side of the performance equation is the CPU. What happens to the CPU when AGP texturing is activated? When AGP texturing is turned on, the graphics card takes control of the main memory bus in order to access texture data. When this happens, the CPU is locked out of main memory. If the CPU is not very busy, this may not be a problem. But if the CPU is engaged in a computationally challenging task (such as a game) it is highly possible that the CPU may stall, waiting for its turn to access main memory.

Right now there is no good way to physically test this scenario. It must be modeled. For this and other reasons, I have built a rather complex software model of the entire PC architecture. Using this model, I am able to estimate the CPU performance impact of main memory arbitration conflicts resulting from AGP texturing.

Intel has publicly released figures that show that a 300MHz P2 requires about 100MB/s of external bandwidth while running 3D Winbench. Third party testing has demonstrated that games can create a CPU bandwidth demand of 50 to 120MB/s from main memory. In the face of this load, AGP texturing could also place a concurrent main memory bandwidth demand of about 50 megabytes per second (or more).

Using this data, my system performance model shows that main memory conflicts between the CPU and the graphics controller will result in a CPU performance reduction of more than 10%. This assumes the use of a Pentium II with the back side cache intact. The cacheless Covington would be brought to its knees under these circumstances.

The assumption that AGP is a little faster now, but will be a lot faster when it is "really turned on" is completely false. In fact the opposite is true. In most cases, system which actually use AGP for textures will be potentially 40% slower than systems which use local graphics memory for textures instead.

Is this enough motivation to make sure you get a graphics card with enough memory to do the job?