← @tinysubversions Twitter archive

Darius Kazemi

@tinysubversions

I've been talking a lot about the bad study about Twitter bots and COVID-19 that has been shared widely over the weekend. I'd like to highlight a GOOD study on the same topic, mostly as an example of what I would like to see more of from researchers, & what media should look for.

5/24/2020, 11:14:44 PM

Favs: 407

Retweets: 185

Darius Kazemi

@tinysubversions

The paper is "COVID-19 on Twitter: Bots, Conspiracies, and Social Media Activism", by Emilio Ferrara (@emilio__ferrara) of USC. It's a preprint, which means it's not yet peer reviewed, but is available for the public and experts (like me) to evaluate.

https://arxiv.org/abs/2004.09531

5/24/2020, 11:14:45 PM

Favs: 72

Retweets: 17

Darius Kazemi

@tinysubversions

The paper analyzes ~100M tweets from Jan 21 to Mar 12. It gives the exact keywords used to find the tweets, the algorithm used to judge how bot-like an account is (Botometer in this case), and the statistical methods used to analyze the bot scores.

5/24/2020, 11:14:45 PM

Favs: 44

Retweets: 2

Darius Kazemi

@tinysubversions

Even more importantly, the entire data set is available here: https://github.com/echen102/COVID-19-TweetIDs (though it's just the IDs so there will be some deleted/suspended tweet content missing)

So first off, the researcher is clear about what he's doing, and the work is reproducible.

5/24/2020, 11:14:45 PM

Favs: 44

Retweets: 6

Darius Kazemi

@tinysubversions

Ferrara makes reasonable assumptions. He says that accts w/ a 10% or less "bot" score are likely human, and accts w/ a 90%+ "bot" score are likely bots. Anything in between is too uncertain so he throws it out bc we can't know if the behavior is from bots or from people.

5/24/2020, 11:14:45 PM

Favs: 35

Retweets: 1

Darius Kazemi

@tinysubversions

Of course, some of those <10% scoring accounts might be bots and some of the >90% scoring accounts might be human. But these are pretty high thresholds. Compare this to the CMU lab's previous studies, which use a 60% likelihood threshold as a cutoff for "we consider this a bot".

5/24/2020, 11:14:45 PM

Favs: 43

Retweets: 3

Darius Kazemi

@tinysubversions

The paper is also forthcoming about what is guesswork and what is measured fact, prefacing speculation with phrases like "We can only speculate that..."

5/24/2020, 11:14:46 PM

Favs: 35

Retweets: 2

Darius Kazemi

@tinysubversions

The paper lays bare its natural language processing analysis too. Check out this figure where he filters for the top 10 distinctive 3-word phrases in likely bot vs human accounts. The words alone tell a qualitative story, and we're given a time series for even more context.

5/24/2020, 11:14:46 PM

Favs: 43

Retweets: 3

Darius Kazemi

@tinysubversions

The paper also asks: if there are bots that try to interfere with discourse, are there bots that try to help inform the public? He looks at the data and finds that yes, these exist too. I've attached the conclusion here, which is a headline-repelling "it's complicated".

5/24/2020, 11:14:46 PM

Favs: 76

Retweets: 12

Darius Kazemi

@tinysubversions

And finally, the author expends significant effort explaining the limitations of the data and of the study.

All in all, a great paper. Interesting enough that I might go ahead and try and reproduce the results and play with the data myself.

5/24/2020, 11:14:46 PM

Favs: 48

Retweets: 2

Darius Kazemi

@tinysubversions

Researchers: please take note of these practices and try to emulate them.

Journalists: if an academic comes to you with exciting results of a study, ask to see the paper, and make sure it looks like this. (If it's about bots, send it to me! I'll happily opine on its legitimacy.)

5/24/2020, 11:14:47 PM

Favs: 89

Retweets: 9

Darius Kazemi

@tinysubversions

Also @emilio__ferrara is guest editing an upcoming COVID-19 issue of the Journal of Computational Social Science. Here's the call for papers. I expect it will be good reading when it's out.

https://www.springer.com/journal/42001/updates/17993070

5/24/2020, 11:14:47 PM

Favs: 49

Retweets: 6

Darius Kazemi

@tinysubversions

Made a mistake earlier. The paper is not using as a threshold Botometer scores lower than 10% and higher than 90%, it's lower than 10th *percentile* and higher than 90th *percentile*. this translates to Botometer scores of < 0.04 and > 0.44 (am auditing the data today, more soon)

5/26/2020, 9:02:31 AM

Favs: 43

Retweets: 1

Darius Kazemi

@tinysubversions

Folks, Botometer is not looking accurate when I manually review accounts flagged w/ a 90th percentile bot score. Gonna have to write up something substantial and it's probably gonna take longer than a day.

Still: this paper is good science bc I can check its work like this!

5/26/2020, 9:30:44 AM

Favs: 93

Retweets: 8

Darius Kazemi

@tinysubversions

I almost wonder if Botometer should be renamed to Normiemeter because it seems to flag a lot of activity that is just normal interaction w/ twitter by people who are not Terminally Online (people who exclusively use the "share this article" intent feature on news sites to tweet)

5/26/2020, 10:03:31 AM

Favs: 132

Retweets: 35

Darius Kazemi

@tinysubversions

Can you imagine the headlines? "50% of accounts tweeting about COVID are normies"

5/26/2020, 10:05:38 AM

Favs: 108

Retweets: 21

Darius Kazemi

@tinysubversions

As an ethnographer friend of mine pointed out: Botometer's false positives might just be another case of tech people assuming that everyone uses technology exactly like they do

5/26/2020, 10:31:38 AM

Favs: 159

Retweets: 27

Darius Kazemi

@tinysubversions

What's amusing/depressing is the higher the bot score, the less likely the account is to be a bot and the more likely they are to be just a person who doesn't understand Twitter very well and mostly crossposts from their very real Facebook account (or similar)

5/26/2020, 11:15:35 AM

Favs: 96

Retweets: 19

Darius Kazemi

@tinysubversions

0.5 bot score: account created a month ago that only RTs accounts created a month ago

0.95 bot score: grandpa sharing from facebook who probably forgot he even has a twitter account hooked up to his facebook account

5/26/2020, 11:16:44 AM

Favs: 84

Retweets: 10

Darius Kazemi

@tinysubversions

so far I have not found a *single* account scoring higher than 0.9 via botometer that does not appear to just be........... an older person using tech in ways that are considered uncool or gauche by techies

5/26/2020, 11:17:57 AM

Favs: 182

Retweets: 44

Darius Kazemi

@tinysubversions

This has nothing to do with botometer but I remember a researcher saying "we consider an 8 digit number in an account name to be a sign that it's a bot because that's what twitter offers by default when you sign up" ... the implicit assumption being people would change it (1/4)

5/26/2020, 11:31:00 AM

Favs: 68

Retweets: 14

Darius Kazemi

@tinysubversions

But if you sign up for a Twitter account in 2020, you literally don't have the option to choose a username! They give you a username that is something like name12345678, and then you have to go into your settings and manually change it, and they don't prompt you to do this! (2/4)

5/26/2020, 11:31:00 AM

Favs: 107

Retweets: 20

Darius Kazemi

@tinysubversions

If you are not very technical & don't like to poke around in your application settings, and you're not social media savvy and therefore don't understand that the Twitter equivalent of dkazemi83@aol.com is a corny username, you're not gonna change it. Doesn't make you a bot. (3/4)

5/26/2020, 11:31:00 AM

Favs: 63

Retweets: 5

Darius Kazemi

@tinysubversions

In fact if I were building a botnet one of the first things I would do is make non-default usernames so as not to appear fishy. I wouldn't be surprised if name12345678 accounts are more likely to be simply less-computer-literate humans rather than bots. (4/4)

5/26/2020, 11:31:01 AM

Favs: 81

Retweets: 5

Darius Kazemi

@tinysubversions

I hate promoting this but since I am an independent researcher I should probably note that I have a Patreon and every dollar there helps me do work like this https://www.patreon.com/tinysubversions

5/26/2020, 11:53:06 AM

Favs: 57

Retweets: 12

Darius Kazemi

@tinysubversions

I've spoken now with a researcher on Botometer and it seems one issue is that the tool is trained to find ANY automation and tag it as bot, including "I'm a person with a facebook account, I do not have time to run a FB and a Twitter, so I've hooked up FB to crosspost to Twitter"

5/28/2020, 4:18:29 PM

Favs: 29

Retweets: 7

Darius Kazemi

@tinysubversions

That is a narrowly technically correct detection of a bot, but there is a kind of mismatch between what people think when they see "bot" in a headline. This has real ramifications on political discourse.

5/28/2020, 4:18:30 PM

Favs: 19

Retweets: 1

Darius Kazemi

@tinysubversions

And while I don't particularly blame individual members of the public for misusing Botometer, it is a damn shame when other social science researchers or journalists and investigators take the Botometer scores and twist their meaning to get interesting results.

5/28/2020, 4:20:08 PM

Favs: 17

Retweets: 2

Darius Kazemi

@tinysubversions

also there is definitely a false positive problem with like... kpop fans which I'm going to continue to dig into, heh

5/28/2020, 4:31:48 PM

Favs: 19

Retweets: 1

Darius Kazemi

@tinysubversions

another category of false positive is people who use the "share intent" on Twitter -- that is, you're reading an article and you press the "share to Twitter" button, which usually results in a pre-written tweet that is the title of the article name and then "/via @.thesource"

5/28/2020, 4:43:02 PM

Favs: 15

Retweets: 0

Darius Kazemi

@tinysubversions

There are people who exclusively tweet with the share intent, and that's actually valid, manual curation behavior, but some of these get tagged as bots in the training data set

5/28/2020, 4:43:02 PM

Favs: 11

Retweets: 0

Darius Kazemi

@tinysubversions

Continuing to audit the training data. Another thing that comes up a lot are accounts that are technically doing bot activity, so they are correctly tagged, but they're not what I would call in the spirit of what people say when they talk about "bots" in social media.

5/29/2020, 10:44:33 PM

Favs: 13

Retweets: 1

Darius Kazemi

@tinysubversions

For example: the One Piece Treasure Cruise mobile game was incredibly popular when the training accounts were collected. A full 5% of the bots (again, correctly) tagged by the human trainers were players of this game, which would automatically post to their linked twitter account

5/29/2020, 10:44:33 PM

Favs: 21

Retweets: 6

Darius Kazemi

@tinysubversions

Does signing up for a game with my twitter which posts my in-game activities count as bot behavior? It causes lots of automated activity on my account. But is this how we want to train our bot-detection algorithms that are supposed to tell us things about political discourse?

5/29/2020, 10:44:33 PM

Favs: 28

Retweets: 4

Darius Kazemi

@tinysubversions

I see other activities like this too -- the person who doesn't have time for twitter and so they hook up their Facebook to crosspost to it... that's technically a bot but it's ultimately a real person doing real things. Again: is this how we want to train our algorithms?

5/29/2020, 10:46:05 PM

Favs: 14

Retweets: 0

Darius Kazemi

@tinysubversions

Sorry, 5% of the accounts used for training I've audited *so far* that were tagged by human volunteers as bots were players of that One Piece game. I'm still going through, I've only audited 15% of the accounts in the bot-positive training data set so far. Plenty more to go

5/29/2020, 10:47:55 PM

Favs: 17

Retweets: 0

Darius Kazemi

@tinysubversions

Some of these make me laugh: I'll be looking at an account and thinking "Hmmm, this seems like it might be SEO spam. [clicks profile] 'Twitter account of [person], SEO expert. Message me on LinkedIn for consulting!'"

lol ok that clinches it

5/29/2020, 10:55:19 PM

Favs: 30

Retweets: 0

Darius Kazemi

@tinysubversions

Something else is becoming clear to me: I'm doing this auditing of an entire ML training set. I'm an expert in the field. I estimate it'll take me 40 hours for a 1st pass audit. Any well funded lab could pay me or someone like me for a week of consulting & clean up their dataset

5/29/2020, 10:57:36 PM

Favs: 43

Retweets: 1

Darius Kazemi

@tinysubversions

Lol just found an actual city councilwoman who is just not very good at twitter and isn't verified who was tagged as a bot in the training set (a google shows it's her real account)

5/29/2020, 11:13:14 PM

Favs: 42

Retweets: 6

Darius Kazemi

@tinysubversions

(deleted some tweets that I should probably not speculate on until I gather more data)

5/29/2020, 11:36:40 PM

Favs: 2

Retweets: 0

Darius Kazemi

@tinysubversions

Something else I've noticed is that horny people get tagged as bots in the training set. Just like, people being thirsty and posting porn they like. @'ing sex workers. I wonder if these were tagged as bots because of bias that anything to do with porn must be spam?

5/29/2020, 11:38:01 PM

Favs: 32

Retweets: 7

Darius Kazemi

@tinysubversions

These are burner accounts for people to engage with porn. They are going have weird networks of follow/follower networks by default, that won't make sense and are going to seem botlike (following lots of accounts, no followers). This is because people like to jerk off in privacy.

5/29/2020, 11:39:55 PM

Favs: 20

Retweets: 0

Darius Kazemi

@tinysubversions

Did you expect this thread was going to go full ~night twitter~? It's about midnight here...

5/29/2020, 11:41:14 PM

Favs: 14

Retweets: 0

Darius Kazemi

@tinysubversions

A category of true false positive that is showing up over and over is that many accounts posting in not-English were tagged (by English-speaking university students) as bots. The researchers do state in their own papers that Botometer's quality drops for non-English accounts

5/29/2020, 11:50:56 PM

Favs: 28

Retweets: 6

Darius Kazemi

@tinysubversions

Here's an even more depressing category of false positive for me: non-NA/EU people who post in English who get tagged as bots. They're just posting in a different cultural context. But they get tagged by the human as spam, probably because it seems nonsensical to the human.

5/30/2020, 12:13:13 AM

Favs: 42

Retweets: 10

Darius Kazemi

@tinysubversions

Like I'm looking at an account that is a very nice Black South African lady who is just responds with lots of emoji to cheer on other Black women who are doing well in their careers. Tagged as a bot in the training set

5/30/2020, 12:15:11 AM

Favs: 28

Retweets: 2

Darius Kazemi

@tinysubversions

Any account that posts ANYTHING in some kind of creole or patois, forget about it, that gets tagged as a bot by college students. I think it's because they see a kind of English hybrid they've never seen before and assume that it's gibberish, like a language algorithm gone wrong.

5/30/2020, 12:24:07 AM

Favs: 36

Retweets: 6

Darius Kazemi

@tinysubversions

btw I've doubled the size of my audit set and the 5% figure for One Piece Treasure Island crossposters continues to hold. We may live in a world where a One Piece mobile game affected the results of hundreds of social science papers about the bot menace

https://twitter.com/tinysubversions/status/1266606403088601090

5/30/2020, 12:44:38 AM

Favs: 36

Retweets: 14

Darius Kazemi

@tinysubversions

Here's an example of a "cyborg" type account. Part automated, part "natural", all real person. This is a very sincere mom with a very real facebook account who crossposts to twitter but occasionally logs in to twitter to cheer on her favorite sports teams

5/30/2020, 1:05:18 AM

Favs: 17

Retweets: 6

Darius Kazemi

@tinysubversions

Like, is this a bot? I mean... even by the very technical definition of "an account that is automated" the answer is "only sometimes"

5/30/2020, 1:05:52 AM

Favs: 19

Retweets: 1