How tweets reveal where you're from

Danny Moloshok / Reuters

Do your Twitter updates betray where you're tweeting from? Scientists say they can.

On the Internet, nobody knows you're a dog, but on Twitter, your tweets likely reveal where you are. Computer scientists report that the microblogging service reflects regional dialects and slang.

In northern California, for example, when something is cool, it's tweeted as "koo," while in southern California, it's "coo," post-doctoral fellow Jacob Eisenstein and his colleagues at Carnegie Mellon University found. The word "something" is tweeted as "sumthin" in most parts of the country, but New Yorkers favor the term "suttin" instead.  LOL, the acronym for "laughing out loud," is common on Twitter almost everywhere but Washington, D.C., where the cruder "LLS" takes precedence.

How they did it
For the study, Eisenstein and his co-authors collected a week's worth of Twitter messages in March 2010 and selected geotagged messages from users who wrote at least 20 tweets. That gave them a database of 9,500 users and 380,000 messages.

They then analyzed the raw text in those messages with a model trained to pick out regional differences such as favored Twitter slang terms ("hella" in Northern California, "wasssup" in New York) as well as sport-team preferences (for example, the Celtics in Boston, the Knicks in New York, the Cavs in Cleveland).

The researchers found that Twitter postings also reflect well-known regionalisms from spoken speech, such as Southerners' "y'all" vs. Pittsburghers' "yinz," and the regional-based references to soda vs. pop vs. Coke.

The model, verified with the geotag information, could predict the location of a microblogger in the U.S. to within 300 miles.

Eisenstein et al. / CMU

Researchers clustered Twitter users based on the regional terms they included in their tweets. This map shows how tweets were clustered to reflect different characteristic regions, including Northern and Southern California, Chicago, the Lake Erie region, Boston, New York, Washington, Northern vs. Southern states, and Florida.

Evolving language
"The study shows that people continue to develop new ways of using language, regardless of whether they're talking over lunch or exchanging messages on Twitter," Eisenstein told me via e-mail today.

"But we don't know whether the geographical specificity of these new forms are simply the result of random variation propagating through social networks that are geographically local, or whether it represents an inherent need to express our regional and community affiliations using language."

Written language is traditionally more homogenized than spoken language, but Eisenstein theorizes that Twitter is more reflective of regional dialects because tweets are more informal and conversational. "It will be interesting to see what happens. Will 'suttin' remain a word we see primarily in New York City, or will it spread?" Eisenstein mused in a news release sent out today.

Eisenstein is presenting the study Saturday at the Linguistic Society of America annual meeting in Pittsburgh. A copy of the paper is available here.

Frontiers of language:


In addition to Eisenstein, the authors of "A Latent Variable Model for Geographic Lexical Variation" include Brendan O'Connor, Noah A. Smith and Eric P. Xing, all from Carnegie Mellon University. The research was supported in part by funding from Google, the Air Force Office of Scientific Research, the Office of Naval Research, the National Science Foundation and the Alfred P. Sloan Foundation.

John Roach is a contributing writer for msnbc.com. Connect with the Cosmic Log community by hitting the "like" button on the Cosmic Log Facebook page or following msnbc.com's science editor, Alan Boyle, on Twitter (@b0yle).

Discuss this post

 Gotta have a pretty lousey life to need to twitter people.

  • 2 votes
Reply#1 - Fri Jan 7, 2011 5:54 PM EST

Dear Twitter Kettle,

You are black.

Love, Newsvine Pot.

  • 1 vote
#1.1 - Fri Jan 7, 2011 8:08 PM EST

This from someone who goes online to insult people he doesn't know anything about? Hilarious.

  • 2 votes
#1.2 - Sat Jan 8, 2011 2:32 AM EST
Reply

Wow kinda scary when you think about it. Wow.

    Reply#2 - Fri Jan 7, 2011 6:08 PM EST
    kantouleDeleted

    I don't understand why people can't just type words as they are supposed to be spelled.(By the way, it took me less than half a minute to type both of these sentences with all the correct spellings.)

    • 2 votes
    Reply#4 - Fri Jan 7, 2011 8:07 PM EST

    But you forgot to put a space between the period and the bracket...

      #4.1 - Sat Jan 8, 2011 11:25 AM EST

      @btcoates, Thank you for helping to preserve the dying English language. Accolades to you :)

      @punk chemist, those are parentheses (), not brackets [], {}

        #4.2 - Sun Jan 9, 2011 10:29 AM EST

        @observer, very true, I had a bit of a brain fart and couldn't think of the correct word. However, I think you missed the irony there; if you're going to praise yourself on a perfectly correct sentence, you should at least get it right.

          #4.3 - Mon Jan 10, 2011 10:30 AM EST

          and technically, parentheses are a type of bracket.

            #4.4 - Mon Jan 10, 2011 11:02 AM EST

            @punk chemist, no, you are just being an argumentative muppet. He has a good point. His leaving the space out is a mistake, rather than a lack of knowledge. The time saving gained from the use of shortcuts is negligble. The reason it is a common occurrence is because many people just can't spell properly, often due to not reading enough, and assume the generalities as they see them more often via electronic communication.

            As per usual for the average internet dweller you would rather divert attention from the fact at hand and confuse it with some parallel, but unrelated point. Almost as pathetic as the general population's spelling ability but there it is...

              #4.5 - Tue Jan 11, 2011 7:42 AM EST
              Reply

              So I guess if I tweet about "something" I reveal that I'm just old?

              • 1 vote
              Reply#5 - Fri Jan 7, 2011 8:29 PM EST
              changqilaiDeleted

              I wish here has spell check and it is just so terrible. But, thanks for dictionary.com...

              I love the First amendment; and I will take Fifth for the worse part.

                Reply#7 - Sat Jan 8, 2011 3:54 AM EST

                And for the rest of us who don't write in slang?

                Asides from local references and explicit talk about my location, where I live may not be obvious from what I type. And typing like I have a character limit is probably a consequence of texting on a clamshell phone or having a twitter account, rather than giving obvious clues as to your location.

                  Reply#8 - Sat Jan 8, 2011 10:32 AM EST
                  xinkuyaDeleted

                  I think it would be a more telling study if it included education, socio-economic and ethnic heritage in the demographic matrix. As a New Yorker I personally do not know anyone or of anyone that says or writes 'wasssup' or 'suttin'.

                  • 3 votes
                  Reply#10 - Sat Jan 8, 2011 11:52 AM EST

                  A tweet is from all walks of life. We have the freedom to express however we choose. It is no matter where we r from.

                    Reply#11 - Sat Jan 8, 2011 6:54 PM EST

                    Spend time finding a cure for cancer or improve our society please.

                      Reply#12 - Sat Jan 8, 2011 6:55 PM EST

                      Unless you are involved in cancer research why don't you quit life and spare the world of the ecological foot print you are causing? Would be more pleasant than reading the rubbish you post.

                        #12.1 - Tue Jan 11, 2011 7:45 AM EST
                        Reply

                        My favorite tweet is pwain M&Ms. My favorite healthy tweet is yogurt covered waisins.

                          Reply#13 - Sat Jan 8, 2011 8:33 PM EST

                          Most who text use abbreviations. The limited length of tweets would encourage this. As a parent I have to work hard to keep up with my son's abbreviations.

                          Anong

                          Why, YOU are doing such a great job!

                            Reply#14 - Sun Jan 9, 2011 1:53 AM EST
                            erwusuiDeleted
                            You're in Easy Mode. If you prefer, you can use XHTML Mode instead.
                            As a new user, you may notice a few temporary content restrictions. Click here for more info.