gm8xx8

deepseek 🚢 

Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. 

Key Points:

- trains only task-relevant experts, cutting storage by up to 90% and training time by 30%.
- nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0.
- excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5.

paper: https://arxiv.org/abs/2407.01906
code: https://github.com/deepseek-ai/ESFT
models:

deepseek 🚢 

Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. 

Key Points:

- trains only task-relevant experts, cutting storage by up to 90% and training time by 30%.
- nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0.
- excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5.

paper: https://arxiv.org/abs/2407.01906
code: https://github.com/deepseek-ai/ESFT
models: https://huggingface.co/deepseek-ai/ESFT-vanilla-lite

This is groundbreaking! ESFT significantly optimizes storage and training time while maintaining high performance. Especially impressive in specialized tasks like math and code. Deepseek is pushing the limits of efficiency in LLM customization. Can't wait to see how this evolves!