oranKer on Warpcast

Content pfp

0 reply

0 recast

0 reaction

oranKer pfp

I did some mapping work for Apple Intelligence Architecture. Its on-device small language model has 3 billion parameters. On the iPhone 15 Pro, the SLM can achieve a time-to-first-token latency of about 0.6 milliseconds per prompt token and a generation rate of 30 tokens per second.

0 reply

0 recast

0 reaction