Numba implementations of some sketch algorithms. Current algorithms implemented are the HyperLogLog++ (approximates cardinality), conservative updating count-min sketch with either linear or log counters (approximates count of given keys), and the top-k api, aka heavy-hitters, (approximates identification of most frequented items in a stream).
Documentation¶
Documentation for the API and theoretical foundations of the algorithms can be found at https://mhendrey.github.io/sketchnu
Modules¶
HyperLogLog++¶
A sketch that estimates the number of unique elements, cardinality, that have been added into the sketch.
Count-Min Sketch¶
A sketch that estimates the number of times a given element has been added into the sketch. This is similar to the Python collections.Counter, but a count-min sketch does not store the keys, which for large datasets, can greatly reduce the memory required.
Heavy-Hitters¶
A sketch that estimates the top k most frequently observed elements that have been added into the sketch. Assumes that distribution of elements is fat-tailed.
Hashes¶
The non-cryptographic hash functions FastHash (both 32 & 64-bit) and Murmur3 (32-bit) have been implemented here.