_images/logo.png

Numba implementations of some sketch algorithms. Current algorithms implemented are the HyperLogLog++ (approximates cardinality), conservative updating count-min sketch with either linear or log counters (approximates count of given keys), and the top-k api, aka heavy-hitters, (approximates identification of most frequented items in a stream).

Documentation

Documentation for the API and theoretical foundations of the algorithms can be found at https://mhendrey.github.io/sketchnu

Installation

Sketchnu may be installed using conda:

conda install -c conda-forge sketchnu

Modules

HyperLogLog++

A sketch that estimates the number of unique elements, cardinality, that have been added into the sketch.

Count-Min Sketch

A sketch that estimates the number of times a given element has been added into the sketch. This is similar to the Python collections.Counter, but a count-min sketch does not store the keys, which for large datasets, can greatly reduce the memory required.

Heavy-Hitters

A sketch that estimates the top k most frequently observed elements that have been added into the sketch. Assumes that distribution of elements is fat-tailed.

Helpers

Functions to aid in parallelizing adding data into sketches.

Hashes

The non-cryptographic hash functions FastHash (both 32 & 64-bit) and Murmur3 (32-bit) have been implemented here.

Indices and tables