May 15, 2013
The Tweet Sound Of Racism
Racism is a dirty word. Even dirtier is engaging in such practices, which we (and I mean all humans) have been doing since we first noticed differences between the races. It is one of our oldest sins. No matter how evolved we are becoming, however, we seem to drag this albatross around with us, teaching hate to the next generation.
One of our newest vices is airing our dirty laundry on social media outlets, specifically Twitter. There seems to be something cathartic in shooting out that 140-character Tweet about our lives, our frustrations and, apparently, our racism.
Gawker reports that the students used only 150,000 Tweets, which seems to be a REALLY small sampling since according to the Washington Post Technology writer Hayley Tsukayama, we are sending over 400 million Tweets a day as of March 2013. The upside to such a small sampling: they were chosen using human intelligence, not a computer algorithm. Hate words were pre-selected, then students read through all of the Tweets from Jun 2012 through April 2013 that contained one of the words, coding the usage as positive, negative or neutral. Only “negative” Tweets made it onto their map; in other words, words that were unequivocally deemed hate speech.
Results, as interpreted by the Humboldt students:
- The South has a slightly more diverse take on bigotry than the north.
- The N-word shows up anywhere there is a population density, except Southern California.
- Wetback beat out YOLO in Texas.
- There’s a nasty word for Koreans that Georgians prefer.
- Virginia’s racial slur of choice is one for Asians that rhymes with “clink.”
- Apparently the nasty word for Hispanics that rhymes with “tick” doesn’t get used much anymore, at least by people who can actually spell it.
- And unsurprisingly, homophobia runs rampant across the country, except for Los Angeles.
The Guardian’s Data Blog has a few issues with this study, starting with the idea accuracy in semantic analysis, or the study of word usage. They do like the fact that the students read the Tweets and analyzed them for hate words, that way a Tweet saying “the word homo is offensive” wouldn’t be classed as hate speech the way a computer algorithm would. They also applauded the efforts to normalize the data, scaling for the total twitter traffic to show the frequency of hateful words.
The blog sees a problem with data mining Twitter at all as being representative of the views of the entire nation, however. So did redOrbit writer Michael Harper in a recent blog about a Pew Research Center survey. As Harper says, “The Pew Research Center confirmed that Twitter is a place for snark, sarcasm and general negativity. I say “confirmed,” of course, because any loyal Twitter user already knows this.”
I’ve had concerns with Twitter-mining ever since the “happiest places in the country” study was done by the University of Vermont, and the F-bomb heatmap created by Vertaline. Why, you ask? Well, it’s simple. When you use a computer to sort words, you get only those words and not the content or context behind them. The Happiest places study, for instance, found that Beaumont, Texas, is the unhappiest place in the US based on the fact that we cuss a lot. (I say we because I grew up just 25 miles north of Beaumont.) But it didn’t take into consideration why the person was cussing, and being from the South, we cuss for every occasion.
I think that this gives the Humboldt University study a leg up over other Twitter-mining studies, but I still think that this method of data collection has a long way to go.
Image Credit: Photos.com