Content pfp
Content
@
0 reply
0 recast
0 reaction

phil pfp
phil
@phil
What are the best tools for running a scaleable cloud instance of Llama3 for text generation? It doesn't need to go into production, but I want to test something on a machine larger than my MBP. I've tried Replit (not supported) and AWS (way too complicated for such a simple task). Any obvious tools?
4 replies
2 recasts
11 reactions

Zenigame pfp
Zenigame
@zeni.eth
Have you tried Modal? I've found it very easy to use and the team extremely responsive. https://modal.com/docs/examples/text_generation_inference
0 reply
0 recast
1 reaction

Jason pfp
Jason
@jachian
+1 to Modal having good docs for sample code to set up a Llama instance Other option is to use some of these secondary hosting platforms for open source models like Lorax on Predibase
0 reply
0 recast
0 reaction

Ren 🎩 Ⓜ️ pfp
Ren 🎩 Ⓜ️
@renatov.eth
also interesting this question
0 reply
0 recast
0 reaction

chompk ↑ pfp
chompk ↑
@chompk
My company use vLLM with a rented GPU VM I also have one of my friend hosting LLM as a service as well https://float16.cloud
0 reply
0 recast
0 reaction