Orbis And Durango Thoughts From Timothy Lottes

January 24, 2013

Timothy Lottes the inventor of Fast Approximate Anti-Aliasing (FXAA) has given his thoughts on the upcoming next gen consoles from Sony and Microsoft:

Assuming a 7970M in the PS4, AMD has already released the hardware ISA docs to the public, so it is relatively easy to know what developers might have access to do on a PS4. Lets start with the basics known from PC. AMD’s existing profiling tools support true async timer queries (where the timer results are written to a buffer on the GPU, then async read on the CPU). This enables the consistent profiling game developers require when optimizing code. AMD also provides tools for developers to view the output GPU assembly for compiled shaders, another must for console development. Now lets dive into what isn’t provided on PC but what can be found in AMD’s GCN ISA docs,

Dual Asynchronous Compute Engines (ACE) :: Specifically “parallel operation with graphics and fast switching between task submissions” and “support of OCL 1.2 device partitioning”. Sounds like at a minimum a developer can statically partition the device such that graphics can compute can run in parallel. For a PC, static partition would be horrible because of the different GPU configurations to support, but for a dedicated console, this is all you need. This opens up a much easier way to hide small compute jobs in a sea of GPU filling graphics work like post processing or shading. The way I do this on PC now is to abuse vertex shaders for full screen passes (the first triangle is full screen, and the rest are degenerates, use an uber-shader for the vertex shading looking at gl_VertexID and branching into “compute” work, being careful to space out the jobs by the SIMD width to avoid stalling the first triangle, or loading up one SIMD unit on the machine, … like I said, complicated). In any case, this Dual ACE system likely makes it practical to port over a large amount of the Killzone SPU jobs to the GPU even if they don’t completely fill the GPU (which would be a problem without complex uber-kernels on something like CUDA on the PC).

Dual High Performance DMA Engines :: Developers would get access to do async CPU->GPU or GPU->CPU memory transfers without stalling the graphics pipeline, and specifically ability to control semaphores in the push buffer(s) to insure no stalls and low latency scheduling. This is something the PC APIs get horribly wrong, as all memory copies are implicit without really giving control to the developer. This translates to much better resource streaming on a console.

My guess is that the real reason for 8GB of memory is because this box is a DVR which actually runs “Windows” (which requires a GB or two or three of “overhead”), but like Windows RT (Windows on ARM) only exposes a non-desktop UI to the user. There are a bunch of reasons they might ditch the real-time console OS, one being that if they don’t provide low level access to developers, that it might enable a faster refresh on backwards compatible hardware. In theory the developer just targets the box like it was a special DX11 “PC” with a few extra changes like hints for surfaces which should go in ESRAM, then on the next refresh hardware, all prior games just get better FPS or resolution or AA. Of course if they do that, then it is just another PC, just lower performance, with all the latency baggage, and lack of low level magic which makes 1st party games stand out and sell the platform.

To read the full article, visit the source here:
Timothy Lottes Blog

Tweet this!Tweet this!

Previous post:

Next post: