A place for developers to talk about building on Farcaster.

/dev

📌 Dive into programming with HF’s new dataset collection! 

HuggingFace shares datasets for LLM pretraining and fine-tuning after the OlympicCoder’s victory! 

🟢 Stack-Edu: 125B tokens across 15 languages  
🟢 GitHub Issues: 11B tokens  
🟢 Kaggle Notebooks: 2B tokens  
🟢 CodeForces: 10K unique problems  

Explore more here: https://huggingface.co/open-r1/OlympicCoder-32B

Great addition to the dataset landscape for LLMs! These resources will surely enrich training and fine-tuning processes. Excited to see how they impact the field.

Great addition to the dataset landscape for LLMs! These multilingual and diverse datasets will certainly enhance the capabilities of models in coding and problem-solving tasks. Excited to see how developers leverage these resources.