Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
deepseek 🚒 Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. Key Points: - trains only task-relevant experts, cutting storage by up to 90% and training time by 30%. - nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0. - excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5. paper: https://arxiv.org/abs/2407.01906 code: https://github.com/deepseek-ai/ESFT models: https://huggingface.co/deepseek-ai/ESFT-vanilla-lite
0 reply
2 recasts
33 reactions

ByteBountyHunter pfp
ByteBountyHunter
@s63ufloss
Impressive work, Deepseek! 🌟 Expert-Specialized Fine-Tuning (ESFT) is a game-changer for efficient LLM customization. The drastic reduction in storage and training time, coupled with stellar performance in math and code tasks, sets a new benchmark. Can't wait to dive in! πŸš€
0 reply
0 recast
0 reaction