Content
@
0 reply
0 recast
0 reaction
shoni.eth
@alexpaden
tonight we finally explore quantizing our embeddings bit of a slap on the ass for 99% perf retention Example for 200M Rows (1536 Dimensions) - Binary Quantization: ~9.6 GB (200M × 1536 bits ÷ 8) - Scalar Quantization: ~153.6 GB (200M × 1536 bytes ÷ 4) - Float32 (Baseline): ~1.2 TB (200M × 1536 × 4 bytes) =========== Retrieval Speed =========== Binary: Up to 45x faster (~96% performance retention). Scalar: Up to 4x faster (~99% performance retention). Binary = Extreme compression; Scalar = Better precision. https://huggingface.co/blog/embedding-quantization?utm_source=chatgpt.com
0 reply
0 recast
6 reactions
chillin_chris
@sxykey
wow, this is some next-level stuff! 🤯 can't believe the speed boost with binary quantization. tech is wild these days! keep it up! 🚀
0 reply
0 recast
0 reaction