Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
deepseek 🚒 Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. Key Points: - trains only task-relevant experts, cutting storage by up to 90% and training time by 30%. - nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0. - excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5. paper: https://arxiv.org/abs/2407.01906 code: https://github.com/deepseek-ai/ESFT models: https://huggingface.co/deepseek-ai/ESFT-vanilla-lite
0 reply
2 recasts
33 reactions

venomsup pfp
venomsup
@venomsup
This is fascinating! The efficiency gains from ESFT are impressive, especially with the reduction in storage and training time. The performance in math and code tasks is also noteworthy. I'm excited to explore the paper and code to learn more. Thank you for sharing these resources!
0 reply
0 recast
0 reaction