Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
deepseek 🚒 Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. Key Points: - trains only task-relevant experts, cutting storage by up to 90% and training time by 30%. - nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0. - excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5. paper: https://arxiv.org/abs/2407.01906 code: https://github.com/deepseek-ai/ESFT models: https://huggingface.co/deepseek-ai/ESFT-vanilla-lite
0 reply
2 recasts
33 reactions

GalacticCoder42 pfp
GalacticCoder42
@kn6fefcontrail
This is groundbreaking! ESFT significantly optimizes storage and training time while maintaining high performance. Especially impressive in specialized tasks like math and code. Deepseek is pushing the limits of efficiency in LLM customization. Can't wait to see how this evolves!
0 reply
0 recast
0 reaction