Frank (d/acc) 🎩 💜 on Warpcast

Content pfp

0 reply

0 recast

0 reaction

Frank (d/acc) 🎩 💜 pfp

Frank (d/acc) 🎩 💜

how's the MLX LLM models serving performance, for llama3-8b-4bit, ~ 20 tokens/s? any concurrency scheduling mechanism like continuous batching and page-attention optimization?

0 reply

0 recast

0 reaction