frend pfp
frend
@frend
How come llama.cpp's convert_hf_to_gguf.py supports Q8_0 but if you want to use Q4_0 you first need to use fp16 and then build llama-quantize to finally get it to Q4_0? It gets you there eventually but I feel it could be simpler.
1 reply
0 recast
1 reaction

horsefacts πŸš‚ pfp
horsefacts πŸš‚
@horsefacts.eth
you should join /ai, ping @gm8xx8
1 reply
0 recast
1 reaction