← @tinysubversions Twitter archive

Darius Kazemi

@tinysubversions

I've been talking a lot about the bad study about Twitter bots and COVID-19 that has been shared widely over the weekend. I'd like to highlight a GOOD study on the same topic, mostly as an example of what I would like to see more of from researchers, & what media should look for.

5/24/2020, 11:14:44 PM

Favs: 407

Retweets: 185

Darius Kazemi

@tinysubversions

The paper is "COVID-19 on Twitter: Bots, Conspiracies, and Social Media Activism", by Emilio Ferrara (@emilio__ferrara) of USC. It's a preprint, which means it's not yet peer reviewed, but is available for the public and experts (like me) to evaluate.

https://arxiv.org/abs/2004.09531

5/24/2020, 11:14:45 PM

Favs: 72

Retweets: 17

Darius Kazemi

@tinysubversions

The paper analyzes ~100M tweets from Jan 21 to Mar 12. It gives the exact keywords used to find the tweets, the algorithm used to judge how bot-like an account is (Botometer in this case), and the statistical methods used to analyze the bot scores.

5/24/2020, 11:14:45 PM

Favs: 44

Retweets: 2

Darius Kazemi

@tinysubversions

Even more importantly, the entire data set is available here: https://github.com/echen102/COVID-19-TweetIDs (though it's just the IDs so there will be some deleted/suspended tweet content missing)

So first off, the researcher is clear about what he's doing, and the work is reproducible.

5/24/2020, 11:14:45 PM

Favs: 44

Retweets: 6

Darius Kazemi

@tinysubversions

Ferrara makes reasonable assumptions. He says that accts w/ a 10% or less "bot" score are likely human, and accts w/ a 90%+ "bot" score are likely bots. Anything in between is too uncertain so he throws it out bc we can't know if the behavior is from bots or from people.

5/24/2020, 11:14:45 PM

Favs: 35

Retweets: 1

Darius Kazemi

@tinysubversions

Of course, some of those <10% scoring accounts might be bots and some of the >90% scoring accounts might be human. But these are pretty high thresholds. Compare this to the CMU lab's previous studies, which use a 60% likelihood threshold as a cutoff for "we consider this a bot".

5/24/2020, 11:14:45 PM

Favs: 43

Retweets: 3

Darius Kazemi

@tinysubversions

The paper is also forthcoming about what is guesswork and what is measured fact, prefacing speculation with phrases like "We can only speculate that..."

5/24/2020, 11:14:46 PM

Favs: 35

Retweets: 2

Darius Kazemi

@tinysubversions

The paper lays bare its natural language processing analysis too. Check out this figure where he filters for the top 10 distinctive 3-word phrases in likely bot vs human accounts. The words alone tell a qualitative story, and we're given a time series for even more context.

5/24/2020, 11:14:46 PM

Favs: 43

Retweets: 3

Darius Kazemi

@tinysubversions

The paper also asks: if there are bots that try to interfere with discourse, are there bots that try to help inform the public? He looks at the data and finds that yes, these exist too. I've attached the conclusion here, which is a headline-repelling "it's complicated".

5/24/2020, 11:14:46 PM

Favs: 76

Retweets: 12

Darius Kazemi

@tinysubversions

And finally, the author expends significant effort explaining the limitations of the data and of the study.

All in all, a great paper. Interesting enough that I might go ahead and try and reproduce the results and play with the data myself.

5/24/2020, 11:14:46 PM

Favs: 48

Retweets: 2

Darius Kazemi

@tinysubversions

Researchers: please take note of these practices and try to emulate them.

Journalists: if an academic comes to you with exciting results of a study, ask to see the paper, and make sure it looks like this. (If it's about bots, send it to me! I'll happily opine on its legitimacy.)

5/24/2020, 11:14:47 PM

Favs: 89

Retweets: 9

Darius Kazemi

@tinysubversions

Also @emilio__ferrara is guest editing an upcoming COVID-19 issue of the Journal of Computational Social Science. Here's the call for papers. I expect it will be good reading when it's out.

https://www.springer.com/journal/42001/updates/17993070

5/24/2020, 11:14:47 PM

Favs: 49

Retweets: 6

Darius Kazemi

@tinysubversions

Made a mistake earlier. The paper is not using as a threshold Botometer scores lower than 10% and higher than 90%, it's lower than 10th *percentile* and higher than 90th *percentile*. this translates to Botometer scores of < 0.04 and > 0.44 (am auditing the data today, more soon)

5/26/2020, 9:02:31 AM

Favs: 43

Retweets: 1

Darius Kazemi

@tinysubversions

Folks, Botometer is not looking accurate when I manually review accounts flagged w/ a 90th percentile bot score. Gonna have to write up something substantial and it's probably gonna take longer than a day.

Still: this paper is good science bc I can check its work like this!

5/26/2020, 9:30:44 AM

Favs: 93

Retweets: 8

Darius Kazemi

@tinysubversions

I almost wonder if Botometer should be renamed to Normiemeter because it seems to flag a lot of activity that is just normal interaction w/ twitter by people who are not Terminally Online (people who exclusively use the "share this article" intent feature on news sites to tweet)

5/26/2020, 10:03:31 AM

Favs: 132

Retweets: 35

Darius Kazemi

@tinysubversions

Can you imagine the headlines? "50% of accounts tweeting about COVID are normies"

5/26/2020, 10:05:38 AM

Favs: 108

Retweets: 21

Darius Kazemi

@tinysubversions

As an ethnographer friend of mine pointed out: Botometer's false positives might just be another case of tech people assuming that everyone uses technology exactly like they do

5/26/2020, 10:31:38 AM

Favs: 159

Retweets: 27

Darius Kazemi

@tinysubversions

(ok I think ethnographer friend and I are going to coauthor a paper because if I'm gonna put this much work into something I might as well at least TRY and get my first peer reviewed academic publication out of it)

5/26/2020, 11:48:20 AM

Favs: 23

Retweets: 0