Lauren McDonagh-Pereira on Farcaster

Varun Srinivasan pfp

Varun Srinivasan

Some thoughts on spam on Farcaster and how we tackle it. First question - What is spam? The naive answer is "automated activity" but this isn't right. Over 75% of spam we find comes from real humans who have phones, wallets and x accounts. The best definition is "inauthentic activity". It's that feeling you get when you realize that someone who is following, liking or replying to is doing it to benefit themselves and not because they're interested in you.

34 replies

66 recasts

192 reactions

Varun Srinivasan pfp

Varun Srinivasan

Spam is driven by people who want to get airdrops. How much can you earn if you set up a fake account on Twitter? Probably not a whole lot and not in directly measurable dollars. If you do the same on Farcaster, you might earn 10 or even a 100 dollars in airdrops. Spammers on Farcaster are very, very motivated. We see patterns like LLM spamming before they become commonplace on larger networks like X.

1 reply

3 recasts

83 reactions

Varun Srinivasan pfp

Varun Srinivasan

Spam also needs to be classified very, very quickly. If we don't, a spammer will interact with a lot of users after signing up making them unhappy. We often have little more than a profile and a few casts when we need to make a decision. If we get this decision wrong people get really unhappy - a spammer who isn't labelled will make existing users unhappy, and a new user who is incorrectly labelled will get frustrated and never come back.

2 replies

0 recast

30 reactions

Varun Srinivasan pfp

Varun Srinivasan

Our spam models puts accounts into one of four categories: Level 0 - not enough information to make a decision Level 1 - an authentic users that other users will like Level 2 - a slightly inauthentic user that some users won't like Level 3 - a very inauthentic user that almost all people will dislike If we're certain that someone is spammy, their account goes into level 3 and their activity is usually hidden under the "Show more" in conversations. In most cases, it's less clear. An account may be good for a while and suddenly turn spammy when a new airdrop launches. In this case Level 2 might be applied, which does something lighter like disqualifying you from boosts, but still letting your replies appear. Accounts are also re-evaluated by our model very often so that new information can be used to make a more accurate decision. We rank and re-rank roughly 4-5 accounts every minute.

3 replies

0 recast

26 reactions

Varun Srinivasan pfp

Varun Srinivasan

There are three parts to building a spam detection model: 1. Define signals, which can be calculated for each account. Ideally they have some correlation to spammy behavior. (e.g. frequency of posting) 2. Label data, either through manual review, user reports or heuristics. The dataset must be large enough that there is significance to the patterns. 3. Train the model, by letting it process labelled data and figure out which combinations of signals are the best predictors. @akshaan chose a type of model called a random forest which is a collection of decision trees. Here's a good lecture on the basics of how a decision tree works: https://www.youtube.com/watch?v=a3ioGSwfVpE&list=PLl8OlHZGYOQ7bkVbuRthEsaLr7bONzbXS&index=29

1 reply

0 recast

24 reactions

Varun Srinivasan pfp

Varun Srinivasan

Random forests can identify very subtle patterns in data. For example, we once had a spam ring in country X that would fire up all their bots at the same time. Because we fed in country and time of posts as signals, it quickly learned that accounts that posted frequently at 10pm in that country were spammy. But what's very interesting is that it otherwise ignored country as a predictor. If you posted from that same country but had a more human-like pattern of posting around the clock it didn't rank you as likely to be a spammer. Forests can get very sophisticated and layer dozens of signals to find such patterns. They can be retrained periodically to adapt as spammers change their behavior.

3 replies

0 recast

19 reactions

Varun Srinivasan pfp

Varun Srinivasan

It's not always intuitive what the best signals are. When I worked on fraud at Coinbase - which is a similar problem - one of our best signals was screen resolution. It turned out that fraudsters used a virtual machine that had a very odd screen resolution that most normal computers would never have. We've found this to be true in Farcaster data as well. I'm going to be more cagey about what the actual signals are, because revealing them will cause spammers to change their behavior making them harder to detect.

5 replies

0 recast

34 reactions

Lauren McDonagh-Pereira pfp

Lauren McDonagh-Pereira

@lampphotography

Hmmm that makes sense. If you are just copy and pasting random replies, you don't need a good screen, because the content you are engaging with doesn't really matter...

0 reply

0 recast

0 reaction