Monthly Archives: October 2013

Social Media Tracking and Analysis System

In the summer of 2011, a group of scientists, research associates and graduate students at the Mississippi State University (MSU) Social Science Research Center (SSRC) began to develop a suite of software applications intended to create a capacity to track and analyze a wide array of social media platforms. The goal was to assist researchers in using social media as a source of scientific data for exploration and investigation. The result of this endeavor was the Social Media Tracking and Analysis System, or SMTAS.

The system was tested with Superstorm Sandy in fall 2012, and it was effective, collecting more than 4.5 million Tweets and an estimated 400,000 images of the storm and its aftermath. This resulted in a significant gain in understanding of the role and implication of social media in natural disaster events.

Arthur G. Cosby, Ph.D. and other researchers at MSU now have an appreciation of the volume of social media communications and their resilience in the face of storm events and power failures. Researchers understand how social media was used as a recovery mechanism, especially in the formation of “organic responses” to the storm event. The software also tracked public sentiment concerning major relief agencies and political leaders in the impacted areas.


“Because of SMTAS, I can access half a billion cases of human behavior a day.”

– Dr. Arthur G. Cosby, SSRC Director and William L. Giles Distinguished Professor


SMTAS was also used to collect Spanish tweets from around the United States (US). That data was compared with the spatial distribution to the US Census. “That type of Census proxy research could not have been done without our system,” said Willie Brown, SSRC Research Associate. Through this project, researchers could overlay geo-located tweets to the Census and thus seeing how the Spanish population distributes and moves over time.

Megan Stubbs-Richardson, a graduate research assistant, used SMTAS to collect and analyze rape myths on Twitter. The content of the tweets was studied and coded. “I collected information on the number of re-tweets (to measure spread of information), and then number of times users tweeted directly to another user because I was interested in how pro-victim versus victim-blame tweets were debated in the news feed,” said Richardson. SMTAS allowed researchers to see whom the target audience was and helped code the tweets into different categories of pro-victim or victim-blame.

Additional analysis is addressing a myriad of other researchable questions. SMTAS is organized in a series of modules that provide unparalleled capacity for research applications with social media. Access module enables researchers to use more than 20 different social media platforms, such as Twitter and Facebook. For example, researchers at MSU have access to the more than 400 million Tweets worldwide. The tracking/scheduling module allows researcher to track social media by word-choice and phrases, location, social media influence, complex time designs, volume of tweets and other features included in social media data. The system has the capacity for primary, secondary, panel and specialized tracking.

For example, in the case of primary tracking, researchers can acquire tweets mentioning “organic food” or “#organic” and also restrict tweets with geo-coordinates to see the locations where users are actively talking about organic food. Secondary tracking for users can be seen during an event, where the system tracks the tweeters/users from an event location and continues tracking them to determine the mobility of users before and after the event. Panel tracking can be used for content analysis of tweets relative to a topic before and after an event. The system also has capabilities for specialized tracking where the tweets are dynamically collected according to geo-coordinates, for example tweets along a hurricane path.

Researchers create tracking/scheduling modules called “studies,” or collections of tracking “rules” (where a rule can be #organic), along with collection parameters such as a time period and the number of tweets they want to collect. SMTAS helps simplify the process by providing users of the system with an easy-to-create interface for studies (See Figure 1).


Figure 1: Creation of Study Interface

 create1 create2 create3

L to R: “Modify your study’s information,” “Create and Edit Search Terms” and “Modify Collection Schedule”


“The best thing about the system is the speed with which one can collect tweets on any topic anywhere in the world. Within a few minutes, we can get the full live feeds. It’s really fascinating!”

– Willie Brown, SSRC Research Associate


Currently, SMTAS is focused on the social network, Twitter, where researchers have access to approximately 500 million tweets per day. Tweets are public postings made by worldwide users of Twitter. Apart from the real-time access to Twitter, SMTAS also has access to historical data/tweets posted via Twitter since 2006. Apart from Twitter data being a rich information source of human behavior (170 million active users) in a social network, it also provides researchers instantaneous information from its user base with its faster (compared to other social networks) message propagation.

SMTAS is based on cloud servers, which work as the backbone of the entire system. The backend database is a cluster of PostgreSQL servers and the web-application is using Django, Celery, Redis, Javascript and Bootstrap. Map generation and tweet mapping is provided by Google Maps. SMTAS also uses a large number of web-services for data-enrichment and a wide variety of software libraries for analysis.


“SMTAS benefited my research by making things much more simplistic (using the search term query) as compared to searching twitter and randomly selecting tweets that are not likely to be as relevant to my research topic.”

– Megan Stubbs-Richardson, SSRC Graduate Research Assistant


A researcher can analyze the filtered/tracked data of a study by using a myriad of analytical modules such as:

1. Traffic Statistics: Each tweet has its own created time-stamp. Using the information the researchers can analyze the traffic statistics for a study. (See Figure 2)

Figure 2: Analyze Traffic


2. Geo-Mapping: About 2-3% of the tweets have geo-coordinates associated with them. These geo-coordinates are accurate to within 2-5 feet of the exact location of user when they tweeted. A researcher can analyze any such tweets overlaid on a Google map. (See Figure 3)

Figure 3: Map Tweets with Geolocation


3. Sentiment Analysis: The tweets collected with a study can also be analyzed for sentiment, which related to the “mood” of the tweet. For example, “It feels good to be home J” is a positive sentiment tweet, whereas “Work was boring today L” relates to a negative one. (See Figures 4 & 5)

Figure 4: Analyze View Sample



Figure 5: Sentiment Over Time

4. Trend Analysis: Researchers can also analyze the trend of keywords, which are present in the tweets of a study. For example, the following figure shows the trend of keywords hurricane, power and flooding on the Hurricane Sandy Study, which collected tweets from New York and Boston area. (See Figure 6)


Figure 6: Trend Analysis

5. Content Analysis: Researchers can also do content analysis on the collected tweets by using the dynamic keyword and hashtag clouds. The module is also capable of drilling down on specific dates and relative terms. (See Figure 7)

“I used SMTAS to collect data using the search term query, which allowed me to narrow my search topics to a specific interest.”

– Megan Stubbs-Richardson

keyword1 keyword2

Figure 7: Most Used Keyword and Most Frequently Used Hashtags

6. Klout Scores/Topic: SMTAS also tracks Klout scores (social influence score of users) and their topic of interest. A researcher can analyze the most influential tweeters and the topics of interest collected by the study. (See Figure 8)

Figure 8: Klout Scores and Topics

klout1 klout2


7. Real-time Monitoring: With the “Twitter Firehose” connection, it presents researchers to obtain real-time information of tweets, where tweets are collected by SMTAS within seconds of the user tweeting. SMTAS has a real-time mapping module along with a streamer for a researcher to analyze the tweets. In the following example, the tweet was geo-mapped to its location using Google street view and a comparison was done with the picture present in the tweet showing the flooding in the area during Hurricane Sandy. (See Figure 9)

Figure 9: Google Street View and Tweeted Image

 google1 Google2



“If the ‘tweeters’ have their geo-locator turned on, we can see where groups are tweeting from instantly, and using Google Street View, we can see the actual spot a tweet comes from.”

– Willie Brown


8. Buckets: A researcher can use the search module (using complex content matching algorithm) to filter through the tweets a study has collected. Once the search is complete, the researcher can then create a “Bucket” which would contain the filtered tweets. Any such buckets of data can be re-analyzed through the analytical modules of SMTAS. This presents researchers the ability to filter in the appropriate content from a larger dataset.





The Social Science Research Center (SSRC) and Mississippi Agricultural and Forestry Experiment Station (MAFES) currently fund the SMTAS project. The project was initiated by Dr. Arthur Cosby (Director of SSRC) and Dr. John Edwards (Director of Survey Research Labs). The team comprises of Dr. Somya Mohanty (Systems Architect and Lead Developer), Jake Gaylor (Lead Programmer) and Josh Richardson (System Administrator).