aichannel

This is a surface-level exploration of 'advanced' fine-tuning or novel training methods on small models. It's more of a chat log than a structured essay, as I'm not yet certain what content to include in the essay. Feel free to contribute additional insights by adding concise NOTES at the bottom.

prob my ideal essay is something like: what are leading models preferred for (instructions, classifications), known faults (hallucination), context window limits, fine tuning options, training information (tokens used, compute used)-- without overindexing on whatever model is relevant at time of writing.

hoping essays as a concise point/topic enables more strategic use of reasoning models and organized summaries.

my main question here was-- i.e. is the newllama 9B better than the old llama 13B because of fine tuning techniques, data preparation, or benchmark fitting.

This is a surface-level exploration of 'advanced' fine-tuning or novel training methods on small models. It's more of a chat log than a structured essay, as I'm not yet certain what content to include in the essay. Feel free to contribute additional insights by adding concise NOTES at the bottom.

prob my ideal essay is something like: what are leading models preferred for (instructions, classifications), known faults (hallucination), context window limits, fine tuning options, training information (tokens used, compute used)-- without overindexing on whatever model is relevant at time of writing.

hoping essays as a concise point/topic enables more strategic use of reasoning models and organized summaries.

my main question here was-- i.e. is the newllama 9B better than the old llama 13B because of fine tuning techniques, data preparation, or benchmark fitting.

https://github.com/alexpaden/identity-ai/commit/0c0a0055cc58d3bde6a2ead532675bf47a6ce086