Content
@
0 reply
0 recast
0 reaction
Varun Srinivasan
@v
We've been working on improving our spam detection. A big source of alpha has been taking algos used to rank content on the web and modifying them to work in Farcaster-space. @akshaan and @notawizard collaborated to add: - PageRank - Hyperlink Induced Topic Search - Louvain Clustering
5 replies
27 recasts
73 reactions
Varun Srinivasan
@v
A quick primer on spam handling in Warpcast: 1. Accounts are categorized roughly as "definitely not spammy", "probably not spammy", "unknown", "maybe a little spammy" and "definitely spammy". 2. Roughly 5% of the network is manually labelled by the team, and this seed data is used to train an ML model. 3. The model looks at a lot of signals and gives the user a score. For example, if you like things 24 hours a day, you're likely not a human. Multiple "bad" signals like this move accounts closer to the "definitely spammy" label. 4. The model has gotten quite good and rarely misses. In the cases where it does, we manually override it and retrain it on misses periodically so it gets better. 5. The model also tries to re-evaluate users periodically, so as users get more active and there is more data it can update its opinion.
3 replies
1 recast
21 reactions
Varun Srinivasan
@v
So back to the new signals. PageRank is the famous google algorithm used to rank webpages, based on how many other pages link to them. We use a modified version this which looks at how many non-spammy users follow you to determine your score, which is then recursively applied to people you follow. A surprising behavior is how many obviously spammy accounts end up being followed by some good accounts. People make mistakes and rarely fix them, so the algorithm has to be adaptive enough to account for that. https://en.wikipedia.org/wiki/PageRank
2 replies
0 recast
3 reactions
Varun Srinivasan
@v
Louvain Clustering is another graph-based approach built around the idea that spammy accounts are much more likely to follow each other in rings. A greedy scoring system is used to identify parts of the network that have tight follow loops, and combined with the average score of users in the groups can determine whether an account is more or less likely to be spammy. https://en.wikipedia.org/wiki/Louvain_method
1 reply
0 recast
3 reactions
Varun Srinivasan
@v
HITS is another pagerank like iterative algo, where every user is given two scores: 1. An authority score, which is the sum of the hub scores of the users who follow them. 2. A hub score, which is the sum of the authority scores of each user it follows. This is good for boosting the scores of certain non spammy users that would not rank favorably in PageRank because they dont yet have a lot of follows, but rank well here because they get a few follows from high signal accounts. https://en.wikipedia.org/wiki/HITS_algorithm
1 reply
0 recast
3 reactions
Mkkstacks
@mkkstacks
Super interesting. I don't have a follow strategy, but I only follow accounts that make an effort to share decent/fair content and engage with other accounts. I could be a bit more intentional to ensure I have enough good followers and review follows to make sure their quality hasn't changed. At worst, some of them cast mostly frames - I don't judge for frame farming as long as engagement seems genuine otherwise. But the algo may punish me for that leniency. I don't have a bunch of followers and don't attempt to farm OG or large accounts, so I appreciate the different algos accounting for different factors.
0 reply
0 recast
1 reaction