Fast kernel density estimation

Alex Reinhart – Updated July 6, 2016 notebooks · refsmmat.com

There are several strategies to speed up kernel density estimation when N, the number of data points, is large, and we also want to evaluate the KDE at some large number N of locations.

k-d trees

By using k-d trees, we can efficiently handle data in something up to like 10 dimensions. The basic idea is that, for any query point, most data points will be far away, and clumps of far-away points may as well be treated as a single point instead. By carefully choosing when to make this approximation, we can enforce bounds on the error introduced. These bounds are usually better than those attained by Gauss transforms (see below), since they’re based on the data, instead of theoretical upper bounds on accuracy loss.

k-d trees were introduced by Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.

Gauss transforms

Using a series expansion of the kernel instead of clever data structures. There are a zillion variations.