Abstract

A non-comparison-based sort that sorts an array in O(k n) work and O(k log(n)) span, where k is the number of bits in each element.

Generally, this is the sorting function we recommend for Futhark programs, but be careful about negative integers (use radix_sort_int) and floating-point numbers (use radix_sort_float). If you need a comparison-based sort, consider merge_sort.

Synopsis

val radix_sort [n] 't : (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val with_indices [n] 'a : (xs: [n]a) -> [n](a, i64)
val radix_sort_by_key [n] 't 'k : (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_int [n] 't : (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_int_by_key [n] 't 'k : (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_float [n] 't : (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_float_by_key [n] 't 'k : (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val chunked_radix_sort [n] 't : (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val chunked_radix_sort_by_key [n] 't 'k : (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val chunked_radix_sort_int [n] 't : (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val chunked_radix_sort_int_by_key [n] 't 'k : (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val chunked_radix_sort_float [n] 't : (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val chunked_radix_sort_float_by_key [n] 't 'k : (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Description

val radix_sort [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

The num_bits and get_bit arguments can be taken from one of the numeric modules of module type integral or float, such as i32 or f64. However, if you know that the input array only uses lower-order bits (say, if all integers are less than 100), then you can profitably pass a smaller num_bits value to reduce the number of sequential iterations.

Warning: while radix sort can be used with numbers, the bitwise representation of of both integers and floats means that negative numbers are sorted as greater than non-negative. Negative floats are further sorted according to their absolute value. For example, radix-sorting [-2.0, -1.0, 0.0, 1.0, 2.0] will produce [0.0, 1.0, 2.0, -1.0, -2.0]. Use radix_sort_int and radix_sort_float in the (likely) cases that this is not what you want.

val with_indices [n] 'a: (xs: [n]a) -> [n](a, i64)
val radix_sort_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort, but sort based on key function.

val radix_sort_int [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

A thin wrapper around radix_sort that ensures negative integers are sorted as expected. Simply pass the usual num_bits and get_bit definitions from e.g. i32.

val radix_sort_int_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_int, but sort based on key function.

val radix_sort_float [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

A thin wrapper around radix_sort that ensures floats are sorted as expected. Simply pass the usual num_bits and get_bit definitions from f32 and f64.

val radix_sort_float_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_float, but sort based on key function.

val chunked_radix_sort [n] 't: (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

This implementation of radix sort works on the outside almost like radix_sort but the implementation is based on a design where you chunk the input into subarrays [1] and sort them. This leads to performance gains if you choose a good chunk size based on the GPU thread block. Using 512 as a chunk size leads to about a 1.5x speedup compared to the normal radix_sort. The sorting algorithm is stable and its work is O(k n) and the span is O(k log(n)) where k is the number of bits in the elements being sorted. In the analysis of the asymptotics we assume the chunk size is some constant in the analysis.

[1] N. Satish, M. Harris and M. Garland, "Designing efficient sorting algorithms for manycore GPUs," 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, 2009, pp. 1-10, doi: 10.1109/IPDPS.2009.5161005.

val chunked_radix_sort_by_key [n] 't 'k: (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_by_key but chunked.

val chunked_radix_sort_int [n] 't: (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_by_int but chunked.

val chunked_radix_sort_int_by_key [n] 't 'k: (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_int_by_key but chunked.

val chunked_radix_sort_float [n] 't: (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_float but chunked.

val chunked_radix_sort_float_by_key [n] 't 'k: (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_float_by_key but chunked.

See Also