Pocket Dimension provides a memory-efficient, dense, random projection of sparse vectors. This random projection is the used to be able to take records {“id”: str, “features”: List[bytes], “counts”: List[int]}, convert them into sparse random vectors using scikit-learn’s FeatureHasher, and then project them down to lower dimensional dense vectors.
When the very large sparse universe becomes too inhospitable, escape into a cozy pocket dimension.
Documentation
Documentation for the API and theoretical foundations of the algorithms can be found at https://mhendrey.github.io/pocket_dimension
Installation
Pocket Dimension may be install using pip:
pip install pocket_dimension
I’m working on a conda-forge version, but this uses pybloomfiltermmap3 which is currently only on PyPi.
Modules
Pocket Dimension
Contains the Numba implementation of the random projection function and the TFVectorizer and TFIDFVectorizer classes that use this to convert TF and TFIDF sparse vectors into dense vectors.