Content pfp
Content
@
0 reply
0 recast
0 reaction

Frank (d/acc) 🎩 💜 pfp
Frank (d/acc) 🎩 💜
@gptgod
how's the MLX LLM models serving performance, for llama3-8b-4bit, ~ 20 tokens/s? any concurrency scheduling mechanism like continuous batching and page-attention optimization?
0 reply
0 recast
0 reaction