Stackless KD-Tree Traversal for High Performance GPU Ray Tracing

Stefan Popov, Johannes Günther, Hans-Peter Seidel, and Philipp Slusallek

Teaser 1 Teaser 2 Teaser 3 Teaser 4


Sig­nif­i­cant ad­vances have been achieved for re­al­time ray trac­ing re­cently, but re­al­time per­for­mance for com­plex scenes still re­quires large com­pu­ta­tional re­sources not yet avail­able from the CPUs in stan­dard PCs. In­ci­den­tally, most of these PCs also con­tain mod­ern GPUs that do of­fer much larger raw com­pute power. How­ever, lim­i­ta­tions in the pro­gram­ming and mem­ory model have so far kept the per­for­mance of GPU ray trac­ers well be­low that of their CPU coun­ter­parts.

In this pa­per we present a novel packet ray traver­sal im­ple­men­ta­tion that com­pletely elim­i­nates the need for main­tain­ing a stack dur­ing kd­tree traver­sal and that re­duces the num­ber of traver­sal steps per ray. While CPUs ben­e­fit mod­er­ately from the stack­less ap­proach, it im­proves GPU per­for­mance sig­nif­i­cantly. We achieve a peak per­for­mance of over 16 mil­lion rays per sec­ond for rea­son­ably com­plex scenes, in­clud­ing com­plex shad­ing and sec­ondary rays. Sev­eral ex­am­ples show that with this new tech­nique GPUs can ac­tu­ally out­per­form equiv­a­lent CPU based ray trac­ers.


10 pages
628 kb

bibtex entry

Valid XHTML 1.0 Transitional