NVIDIA recently released Llama 3.1 Nemotron 70B Instruct, a fine-tuned model using RLHF. 

- Scored 85.0 on Arena Hard, 57.6 on AlpacaEval 2 LC, and 8.98 on MT-Bench
- Achieved 55% on Aider’s leaderboard, just behind Llama-3.1-70B-Instruct at 59%
- Available on Hugging Face and NVIDIA platforms.

I believe it’s ranked 78 overall and that feels accurate. 

In Nvidia’s defense, I don’t think they claimed to be better than Sonnet or GPT-4o, only that their model performed well on synthetic human preference benchmarks. Nemotron is a solid model and a great contribution. Nvidia’s claims were accurate, benchmarks seem to be the culprit. 
 
🤗: https://huggingface.co/collections/nvidia/llama-31-nemotron-70b-670e93cd366feea16abc13d8
nvidia: https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-instruct