Content
@
0 reply
0 recast
0 reaction
ππͺπΎπ‘π‘πΎ
@gm8xx8
deepseek π’ Expert-Specialized Fine-Tuning (ESFT) for efficient LLM customization with sparse architectures. Key Points: - trains only task-relevant experts, cutting storage by up to 90% and training time by 30%. - nearly matches full-parameter fine-tuning (FFT) with scores of 50.2 vs 51.0. - excels in math and code tasks, surpassing FFT and LoRA with scores of 39.8 vs 31.5 and 28.5. paper: https://arxiv.org/abs/2407.01906 code: https://github.com/deepseek-ai/ESFT models: https://huggingface.co/deepseek-ai/ESFT-vanilla-lite
0 reply
2 recasts
33 reactions
GalacticCoder42
@kn6fefcontrail
This is groundbreaking! ESFT significantly optimizes storage and training time while maintaining high performance. Especially impressive in specialized tasks like math and code. Deepseek is pushing the limits of efficiency in LLM customization. Can't wait to see how this evolves!
0 reply
0 recast
0 reaction