Varun Srinivasan pfp
Varun Srinivasan
@v
Some thoughts on spam on Farcaster and how we tackle it. First question - What is spam? The naive answer is "automated activity" but this isn't right. Over 75% of spam we find comes from real humans who have phones, wallets and x accounts. The best definition is "inauthentic activity". It's that feeling you get when you realize that someone who is following, liking or replying to is doing it to benefit themselves and not because they're interested in you.
30 replies
91 recasts
233 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
Spam is driven by people who want to get airdrops. How much can you earn if you set up a fake account on Twitter? Probably not a whole lot and not in directly measurable dollars. If you do the same on Farcaster, you might earn 10 or even a 100 dollars in airdrops. Spammers on Farcaster are very, very motivated. We see patterns like LLM spamming before they become commonplace on larger networks like X.
1 reply
2 recasts
70 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
Spam also needs to be classified very, very quickly. If we don't, a spammer will interact with a lot of users after signing up making them unhappy. We often have little more than a profile and a few casts when we need to make a decision. If we get this decision wrong people get really unhappy - a spammer who isn't labelled will make existing users unhappy, and a new user who is incorrectly labelled will get frustrated and never come back.
2 replies
0 recast
48 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
Our spam models puts accounts into one of four categories: Level 0 - not enough information to make a decision Level 1 - an authentic users that other users will like Level 2 - a slightly inauthentic user that some users won't like Level 3 - a very inauthentic user that almost all people will dislike If we're certain that someone is spammy, their account goes into level 3 and their activity is usually hidden under the "Show more" in conversations. In most cases, it's less clear. An account may be good for a while and suddenly turn spammy when a new airdrop launches. In this case Level 2 might be applied, which does something lighter like disqualifying you from boosts, but still letting your replies appear. Accounts are also re-evaluated by our model very often so that new information can be used to make a more accurate decision. We rank and re-rank roughly 4-5 accounts every minute.
3 replies
1 recast
46 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
There are three parts to building a spam detection model: 1. Define signals, which can be calculated for each account. Ideally they have some correlation to spammy behavior. (e.g. frequency of posting) 2. Label data, either through manual review, user reports or heuristics. The dataset must be large enough that there is significance to the patterns. 3. Train the model, by letting it process labelled data and figure out which combinations of signals are the best predictors. @akshaan chose a type of model called a random forest which is a collection of decision trees. Here's a good lecture on the basics of how a decision tree works: https://www.youtube.com/watch?v=a3ioGSwfVpE&list=PLl8OlHZGYOQ7bkVbuRthEsaLr7bONzbXS&index=29
1 reply
0 recast
40 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
Random forests can identify very subtle patterns in data. For example, we once had a spam ring in country X that would fire up all their bots at the same time. Because we fed in country and time of posts as signals, it quickly learned that accounts that posted frequently at 10pm in that country were spammy. But what's very interesting is that it otherwise ignored country as a predictor. If you posted from that same country but had a more human-like pattern of posting around the clock it didn't rank you as likely to be a spammer. Forests can get very sophisticated and layer dozens of signals to find such patterns. They can be retrained periodically to adapt as spammers change their behavior.
3 replies
0 recast
34 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
It's not always intuitive what the best signals are. When I worked on fraud at Coinbase - which is a similar problem - one of our best signals was screen resolution. It turned out that fraudsters used a virtual machine that had a very odd screen resolution that most normal computers would never have. We've found this to be true in Farcaster data as well. I'm going to be more cagey about what the actual signals are, because revealing them will cause spammers to change their behavior making them harder to detect.
5 replies
0 recast
56 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
Commonly suggested signals like onchain data don't work very well. It turns out that there are a lot of users with little or no blockchain activity that are quite interesting on social networks. And the opposite also tends to be true, which is that there are people with ENS's and other onchain activity that are aggressive spammers and airdrop farmers. We recently tested some onchain signals and found a near-zero improvement in predictive power. This may change over time as more activity moves onchain, but as of today it's not very useful.
3 replies
2 recasts
44 reactions

Varun Srinivasan pfp
Varun Srinivasan
@v
The signals that tend to do very well fall into one of three categories: 1. Graph based -- spammers often share similar patterns of activity which can be used to catch them 2. Behaviors - they also tend to do things a certain way, because they're being repetitive in their actions (e.g. posting at fixed internals) 3. Textual - the content of their casts is often very predictive of their quality
7 replies
1 recast
46 reactions

OMGiDRAWEDit 🦎 pfp
OMGiDRAWEDit 🦎
@omgidrawedit
Super interesting read. Had no clue it was being looked at like this. Cheers V
0 reply
0 recast
0 reaction