Abstract
We attempt to determine the best order and search algorithm to store $n$
comparable data items in an array, $A$, of length $n$ so that we can, for any
query value, $x$, quickly find the smallest value in $A$ that is greater than
or equal to $x$. In particular, we consider the important case where there are
many such queries to the same array, $A$, which resides entirely in RAM. In
addition to the obvious sorted order/binary search combination we consider the
Eytzinger (BFS) layout normally used for heaps, an implicit B-tree layout that
generalizes the Eytzinger layout, and the van Emde Boas layout commonly used in
the cache-oblivious algorithms literature.
After extensive testing and tuning on a wide variety of modern hardware, we
arrive at the conclusion that, for small values of $n$, sorted order, combined
with a good implementation of binary search is best. For larger values of $n$,
we arrive at the surprising conclusion that the Eytzinger layout is usually the
fastest. The latter conclusion is unexpected and goes counter to earlier
experimental work by Brodal, Fagerberg, and Jacob (SODA~2003), who concluded
that both the B-tree and van Emde Boas layouts were faster than the Eytzinger
layout for large values of $n$. Our fastest C++ implementations, when compiled,
use conditional moves to avoid branch mispredictions and prefetching to reduce
cache latency.
Description
[1509.05053] Array Layouts for Comparison-Based Searching
Links and resources
Tags
community