Content pfp
Content
@
0 reply
0 recast
0 reaction

oranKer pfp
oranKer
@oranker.eth
I did some mapping work for Apple Intelligence Architecture. Its on-device small language model has 3 billion parameters. On the iPhone 15 Pro, the SLM can achieve a time-to-first-token latency of about 0.6 milliseconds per prompt token and a generation rate of 30 tokens per second.
0 reply
0 recast
0 reaction