Varun Srinivasan
@v
Some thoughts on spam on Farcaster and how we tackle it. First question - What is spam? The naive answer is "automated activity" but this isn't right. Over 75% of spam we find comes from real humans who have phones, wallets and x accounts. The best definition is "inauthentic activity". It's that feeling you get when you realize that someone who is following, liking or replying to is doing it to benefit themselves and not because they're interested in you.
31 replies
117 recasts
418 reactions
Varun Srinivasan
@v
Spam is driven by people who want to get airdrops. How much can you earn if you set up a fake account on Twitter? Probably not a whole lot and not in directly measurable dollars. If you do the same on Farcaster, you might earn 10 or even a 100 dollars in airdrops. Spammers on Farcaster are very, very motivated. We see patterns like LLM spamming before they become commonplace on larger networks like X.
1 reply
2 recasts
129 reactions
Varun Srinivasan
@v
Spam also needs to be classified very, very quickly. If we don't, a spammer will interact with a lot of users after signing up making them unhappy. We often have little more than a profile and a few casts when we need to make a decision. If we get this decision wrong people get really unhappy - a spammer who isn't labelled will make existing users unhappy, and a new user who is incorrectly labelled will get frustrated and never come back.
2 replies
0 recast
53 reactions
Varun Srinivasan
@v
Our spam models puts accounts into one of four categories: Level 0 - not enough information to make a decision Level 1 - an authentic users that other users will like Level 2 - a slightly inauthentic user that some users won't like Level 3 - a very inauthentic user that almost all people will dislike If we're certain that someone is spammy, their account goes into level 3 and their activity is usually hidden under the "Show more" in conversations. In most cases, it's less clear. An account may be good for a while and suddenly turn spammy when a new airdrop launches. In this case Level 2 might be applied, which does something lighter like disqualifying you from boosts, but still letting your replies appear. Accounts are also re-evaluated by our model very often so that new information can be used to make a more accurate decision. We rank and re-rank roughly 4-5 accounts every minute.
3 replies
1 recast
49 reactions
Varun Srinivasan
@v
There are three parts to building a spam detection model: 1. Define signals, which can be calculated for each account. Ideally they have some correlation to spammy behavior. (e.g. frequency of posting) 2. Label data, either through manual review, user reports or heuristics. The dataset must be large enough that there is significance to the patterns. 3. Train the model, by letting it process labelled data and figure out which combinations of signals are the best predictors. @akshaan chose a type of model called a random forest which is a collection of decision trees. Here's a good lecture on the basics of how a decision tree works: https://www.youtube.com/watch?v=a3ioGSwfVpE&list=PLl8OlHZGYOQ7bkVbuRthEsaLr7bONzbXS&index=29
1 reply
0 recast
43 reactions
Varun Srinivasan
@v
Random forests can identify very subtle patterns in data. For example, we once had a spam ring in country X that would fire up all their bots at the same time. Because we fed in country and time of posts as signals, it quickly learned that accounts that posted frequently at 10pm in that country were spammy. But what's very interesting is that it otherwise ignored country as a predictor. If you posted from that same country but had a more human-like pattern of posting around the clock it didn't rank you as likely to be a spammer. Forests can get very sophisticated and layer dozens of signals to find such patterns. They can be retrained periodically to adapt as spammers change their behavior.
3 replies
0 recast
35 reactions