GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

Paper Slides Technical Report

In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n) / p).

While this makes our approach competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is also very fast from a practical viewpoint. This is achieved by using efficient linear stream memory accesses (and by combining the optimal time approach with algorithms optimized for small input sequences).

We present an implementation on modern programmable graphics hardware (GPUs). On recent GPUs, our optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the CPU, and it is also faster than previous non-optimal sorting approaches on the GPU for sufficiently large input sequences. Because of the excellent scalability of our algorithm with the number of stream processor units p (up to n / log2n or even n / log n units, depending on the stream architecture), our approach profits heavily from the trend of increasing number of fragment processor units on GPUs, so that we can expect further speed improvement with upcoming GPU generations.

BibTeX entry
  author       = "Alexander Gre{\ss} and Gabriel Zachmann",
  booktitle    = "Proc.\ 20th IEEE Int'l Parallel and Distributed
					  Processing Symposium (IPDPS)",
  month        = apr # "25--29",
  title        = "GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures",
  year         = "2006",
  month        = apr # "25--29",
  address      = "Rhodes Island, Greece",
  url          = "",

  author       = "Alexander Gre{\ss} and Gabriel Zachmann",
  title        = "GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures",
  month        = oct,
  year         = "2006",
  institution  = "TU Clausthal",
  address      = "Computer Science Department, Clausthal-Zellerfeld, Germany",
  number       = "IfI-06-11",
  url          = ""
What's new in the tech report
Updated section "Related Work" updated; more figures; new z-order technique for mapping from 1D to 2D; section 7 "GPU-ABiSort optimizations"; new timings and explanations.
In case of problems
In case of problems, please don't hesitate to contact me.
(For instance, if your host is not registered by the world-wide Domain Name Service (DNS), then you will not be able to ftp ...)
Gabriel Zachmann
Last modified: Tue Oct 10 17:41:21 MDT 2006