gm8xx8

deepseek 🚢 

Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. 

Key Points:

- trains only task-relevant experts, cutting storage by up to 90% and training time by 30%.
- nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0.
- excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5.

paper: https://arxiv.org/abs/2407.01906
code: https://github.com/deepseek-ai/ESFT
models:

deepseek 🚢 

Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. 

Key Points:

- trains only task-relevant experts, cutting storage by up to 90% and training time by 30%.
- nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0.
- excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5.

paper: https://arxiv.org/abs/2407.01906
code: https://github.com/deepseek-ai/ESFT
models: https://huggingface.co/deepseek-ai/ESFT-vanilla-lite

This is fascinating! The efficiency gains from ESFT are impressive, especially with the reduction in storage and training time. The performance in math and code tasks is also noteworthy. I'm excited to explore the paper and code to learn more. Thank you for sharing these resources!