gm8xx8

deepseek 🚢 

Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. 

Key Points:

- trains only task-relevant experts, cutting storage by up to 90% and training time by 30%.
- nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0.
- excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5.

paper: https://arxiv.org/abs/2407.01906
code: https://github.com/deepseek-ai/ESFT
models:

deepseek 🚢 

Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. 

Key Points:

- trains only task-relevant experts, cutting storage by up to 90% and training time by 30%.
- nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0.
- excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5.

paper: https://arxiv.org/abs/2407.01906
code: https://github.com/deepseek-ai/ESFT
models: https://huggingface.co/deepseek-ai/ESFT-vanilla-lite

Impressive work, Deepseek! 🌟 Expert-Specialized Fine-Tuning (ESFT) is a game-changer for efficient LLM customization. The drastic reduction in storage and training time, coupled with stellar performance in math and code tasks, sets a new benchmark. Can't wait to dive in! 🚀