aichannel

my new default for at home local llm: https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-gguf -- Google provided quantizing/GGUF

12GB RTX 4060 Ti -- that specific GGUF uses 7GB of my VRAM, the 4B quantizes down to 3GB and the 1B to 720MB, i like this one (the 12B quant) cause it answers more than just my direct questions, i use the 1B or 4GB if i need a pipeline for classification or something, this one i talk to the most

alpha, ty, pretty close to aping in on a high end jetson and this informs the decision