Content
@
0 reply
0 recast
0 reaction
Varun Srinivasan
@v
We've been working on improving our spam detection. A big source of alpha has been taking algos used to rank content on the web and modifying them to work in Farcaster-space. @akshaan and @notawizard collaborated to add: - PageRank - Hyperlink Induced Topic Search - Louvain Clustering
5 replies
27 recasts
81 reactions
Varun Srinivasan
@v
A quick primer on spam handling in Warpcast: 1. Accounts are categorized roughly as "definitely not spammy", "probably not spammy", "unknown", "maybe a little spammy" and "definitely spammy". 2. Roughly 5% of the network is manually labelled by the team, and this seed data is used to train an ML model. 3. The model looks at a lot of signals and gives the user a score. For example, if you like things 24 hours a day, you're likely not a human. Multiple "bad" signals like this move accounts closer to the "definitely spammy" label. 4. The model has gotten quite good and rarely misses. In the cases where it does, we manually override it and retrain it on misses periodically so it gets better. 5. The model also tries to re-evaluate users periodically, so as users get more active and there is more data it can update its opinion.
3 replies
1 recast
30 reactions
Varun Srinivasan
@v
So back to the new signals. PageRank is the famous google algorithm used to rank webpages, based on how many other pages link to them. We use a modified version this which looks at how many non-spammy users follow you to determine your score, which is then recursively applied to people you follow. A surprising behavior is how many obviously spammy accounts end up being followed by some good accounts. People make mistakes and rarely fix them, so the algorithm has to be adaptive enough to account for that. https://en.wikipedia.org/wiki/PageRank
2 replies
0 recast
5 reactions
Thumbs Up
@thumbsup.eth
Is data about how the team has labeled users publicly visible somewhere? If not, it ought to be
1 reply
0 recast
1 reaction
patxol π· anser.social
@patxol.eth
Assuming non-human = spammy is going to be problematic when at the same time agentic bots are promoted
1 reply
0 recast
0 reaction