Content
@
0 reply
0 recast
0 reaction
Michael Huang
@michaelhly
I just shipped a transformer-based SocialNLP toolkit for @farcaster. This library can possibly help construct new ML-based feed algorithms based on text classification instead of relying on manual curation, recasts/reactions, and chronological ordering. Check it out: https://github.com/michaelhly/FarGlot
5 replies
4 recasts
20 reactions
David
@promptrotator.eth
Nice. Do you suspect that there's a big difference between the text of a spam cast and a legitimate one?
1 reply
0 recast
0 reaction
Michael Huang
@michaelhly
- we'll first have to come up with heuristics on what classifies spam (i.e. create a test set based on what users report) - and then train a model to minimize loss - the nature of spam can change, and we'll have to re-tune in short — it depends on how good we're at classifying ("labeling") spam
1 reply
0 recast
0 reaction
Vespertilio
@vespertilio
This is really cool - given that spam changes and ML models can “drift” over time, do you have any thoughts on how to implement/architect this system to ingest user feedback to continually improve the spam classifier?
2 replies
0 recast
1 reaction
Michael Huang
@michaelhly
this is quite above my pay grade 😅 but perhaps the LLM can improve its performance on reasoning datasets by training on its own generated labels: https://arxiv.org/pdf/2210.11610.pdf
1 reply
0 recast
0 reaction
Vespertilio
@vespertilio
That’s an interesting study, thanks for sharing. That’s quite fascinating that LLMs can improve by training on the labels they generated without a ground truth.
0 reply
0 recast
0 reaction