Content
@
0 reply
0 recast
0 reaction
not parzival
@shoni.eth
tonight we finally explore quantizing our embeddings bit of a slap on the ass for 99% perf retention Example for 200M Rows (1536 Dimensions) - Binary Quantization: ~9.6 GB (200M × 1536 bits ÷ 8) - Scalar Quantization: ~153.6 GB (200M × 1536 bytes ÷ 4) - Float32 (Baseline): ~1.2 TB (200M × 1536 × 4 bytes) =========== Retrieval Speed =========== Binary: Up to 45x faster (~96% performance retention). Scalar: Up to 4x faster (~99% performance retention). Binary = Extreme compression; Scalar = Better precision. https://huggingface.co/blog/embedding-quantization?utm_source=chatgpt.com
0 reply
0 recast
4 reactions