Category Archives: SMTAS

Network Analysis of Twitter During Hurricane Sandy

The Social Media Tracking and Analysis System (SMTAS) has contributed to the acquisition of a recent grant funded by the National Oceanic and Atmospheric Administration through the Coastal Storms Awareness Program, managed by the New York, New Jersey and Connecticut Sea Grant Consortium. A team of social scientists and computer programmers at Mississippi State University received funding for this research through the Connecticut Sea Grant program. The title of the project is: Assessment of Social Media Usage during Severe Weather Events and the Development of a Twitter-based Model for Improved Communication of Storm-related Information. One of the main goals of the project is to identify key impact factors affecting the dissemination of storm-related information.

To that end, SMTAS was utilized to collect approximately 12 million tweets during Hurricane Sandy. In order to understand the networks of key-users and the topics that were discussed during the event, researchers utilized the metadata attributes of the raw Twitter data such as user mentions, re-tweet data, and tweet content to analyze/visualize the inter-connectivity of tweets and their users in a graph based model.

The first graph (shown below), is the visualization of connected (users who have either mentioned or re-tweeted another user) users during Hurricane Sandy. In the graph, each node is an individual Twitter user. The size of the node is representative of the frequency of mentions or re-tweets. An edge/line between two nodes exists if there is re-tweet or mention connectivity. The colors of the edges are based on the Modularity index of a node cluster (i.e. the connectivity of a cluster of nodes). The analysis is able to identify prominent users that were mentioned during the time of the event and their closely associated Twitter user. The presence of news agencies (cnnbrk, NewYorkPost, mashable, HuffingtonPost), politicians (MikeBloomberg, BarackObama, GovChristie, CoryBooker), fedral agencies (fema, NOAA), etc, and their connectivity to other users can be identified.

screenshot_users

Connectivity of Twitter Users During Hurricane Sandy

For dynamic graph please use the following link to download pdf: Link

 

The second graph shown below is the network of word co-existence in Twitter messages that were collected during Hurricane Sandy. Researcher are able to identify clusters of words which such as Sandy, Hurricane, Storm, which tend to co-exist together. Words such as Everyone, Stay, Safe, presents another unique topic. Power, still, out, suggests tweets that were talking about the power outages. The network visualization is also able to differentiate noisy tweets (words with you, never, love, LOL, etc) and tweets in other languages (spanish speaking community cluster in the bottom left).

screenshot_words

Word Collation During Hurricane Sandy

For dynamic graph of word collation please use the following link to download pdf: Link

Tagged , , , ,

Social Media Tracking and Analysis System

In the summer of 2011, a group of scientists, research associates and graduate students at the Mississippi State University (MSU) Social Science Research Center (SSRC) began to develop a suite of software applications intended to create a capacity to track and analyze a wide array of social media platforms. The goal was to assist researchers in using social media as a source of scientific data for exploration and investigation. The result of this endeavor was the Social Media Tracking and Analysis System, or SMTAS.

The system was tested with Superstorm Sandy in fall 2012, and it was effective, collecting more than 4.5 million Tweets and an estimated 400,000 images of the storm and its aftermath. This resulted in a significant gain in understanding of the role and implication of social media in natural disaster events.

Arthur G. Cosby, Ph.D. and other researchers at MSU now have an appreciation of the volume of social media communications and their resilience in the face of storm events and power failures. Researchers understand how social media was used as a recovery mechanism, especially in the formation of “organic responses” to the storm event. The software also tracked public sentiment concerning major relief agencies and political leaders in the impacted areas.

 

“Because of SMTAS, I can access half a billion cases of human behavior a day.”

– Dr. Arthur G. Cosby, SSRC Director and William L. Giles Distinguished Professor

 

SMTAS was also used to collect Spanish tweets from around the United States (US). That data was compared with the spatial distribution to the US Census. “That type of Census proxy research could not have been done without our system,” said Willie Brown, SSRC Research Associate. Through this project, researchers could overlay geo-located tweets to the Census and thus seeing how the Spanish population distributes and moves over time.

Megan Stubbs-Richardson, a graduate research assistant, used SMTAS to collect and analyze rape myths on Twitter. The content of the tweets was studied and coded. “I collected information on the number of re-tweets (to measure spread of information), and then number of times users tweeted directly to another user because I was interested in how pro-victim versus victim-blame tweets were debated in the news feed,” said Richardson. SMTAS allowed researchers to see whom the target audience was and helped code the tweets into different categories of pro-victim or victim-blame.

Additional analysis is addressing a myriad of other researchable questions. SMTAS is organized in a series of modules that provide unparalleled capacity for research applications with social media. Access module enables researchers to use more than 20 different social media platforms, such as Twitter and Facebook. For example, researchers at MSU have access to the more than 400 million Tweets worldwide. The tracking/scheduling module allows researcher to track social media by word-choice and phrases, location, social media influence, complex time designs, volume of tweets and other features included in social media data. The system has the capacity for primary, secondary, panel and specialized tracking.

For example, in the case of primary tracking, researchers can acquire tweets mentioning “organic food” or “#organic” and also restrict tweets with geo-coordinates to see the locations where users are actively talking about organic food. Secondary tracking for users can be seen during an event, where the system tracks the tweeters/users from an event location and continues tracking them to determine the mobility of users before and after the event. Panel tracking can be used for content analysis of tweets relative to a topic before and after an event. The system also has capabilities for specialized tracking where the tweets are dynamically collected according to geo-coordinates, for example tweets along a hurricane path.

Researchers create tracking/scheduling modules called “studies,” or collections of tracking “rules” (where a rule can be #organic), along with collection parameters such as a time period and the number of tweets they want to collect. SMTAS helps simplify the process by providing users of the system with an easy-to-create interface for studies (See Figure 1).

 

Figure 1: Creation of Study Interface

 create1 create2 create3

L to R: “Modify your study’s information,” “Create and Edit Search Terms” and “Modify Collection Schedule”

 

“The best thing about the system is the speed with which one can collect tweets on any topic anywhere in the world. Within a few minutes, we can get the full live feeds. It’s really fascinating!”

- Willie Brown, SSRC Research Associate

 

Currently, SMTAS is focused on the social network, Twitter, where researchers have access to approximately 500 million tweets per day. Tweets are public postings made by worldwide users of Twitter. Apart from the real-time access to Twitter, SMTAS also has access to historical data/tweets posted via Twitter since 2006. Apart from Twitter data being a rich information source of human behavior (170 million active users) in a social network, it also provides researchers instantaneous information from its user base with its faster (compared to other social networks) message propagation.

SMTAS is based on cloud servers, which work as the backbone of the entire system. The backend database is a cluster of PostgreSQL servers and the web-application is using Django, Celery, Redis, Javascript and Bootstrap. Map generation and tweet mapping is provided by Google Maps. SMTAS also uses a large number of web-services for data-enrichment and a wide variety of software libraries for analysis.

 

“SMTAS benefited my research by making things much more simplistic (using the search term query) as compared to searching twitter and randomly selecting tweets that are not likely to be as relevant to my research topic.”

– Megan Stubbs-Richardson, SSRC Graduate Research Assistant

 

A researcher can analyze the filtered/tracked data of a study by using a myriad of analytical modules such as:

1. Traffic Statistics: Each tweet has its own created time-stamp. Using the information the researchers can analyze the traffic statistics for a study. (See Figure 2)

Figure 2: Analyze Traffic

 traffic

2. Geo-Mapping: About 2-3% of the tweets have geo-coordinates associated with them. These geo-coordinates are accurate to within 2-5 feet of the exact location of user when they tweeted. A researcher can analyze any such tweets overlaid on a Google map. (See Figure 3)

Figure 3: Map Tweets with Geolocation

 geo

3. Sentiment Analysis: The tweets collected with a study can also be analyzed for sentiment, which related to the “mood” of the tweet. For example, “It feels good to be home J” is a positive sentiment tweet, whereas “Work was boring today L” relates to a negative one. (See Figures 4 & 5)

Figure 4: Analyze View Sample

sample

 sentiment

Figure 5: Sentiment Over Time

4. Trend Analysis: Researchers can also analyze the trend of keywords, which are present in the tweets of a study. For example, the following figure shows the trend of keywords hurricane, power and flooding on the Hurricane Sandy Study, which collected tweets from New York and Boston area. (See Figure 6)

trends

Figure 6: Trend Analysis

5. Content Analysis: Researchers can also do content analysis on the collected tweets by using the dynamic keyword and hashtag clouds. The module is also capable of drilling down on specific dates and relative terms. (See Figure 7)

“I used SMTAS to collect data using the search term query, which allowed me to narrow my search topics to a specific interest.”

– Megan Stubbs-Richardson

keyword1 keyword2

Figure 7: Most Used Keyword and Most Frequently Used Hashtags

6. Klout Scores/Topic: SMTAS also tracks Klout scores (social influence score of users) and their topic of interest. A researcher can analyze the most influential tweeters and the topics of interest collected by the study. (See Figure 8)

Figure 8: Klout Scores and Topics

klout1 klout2

 

7. Real-time Monitoring: With the “Twitter Firehose” connection, it presents researchers to obtain real-time information of tweets, where tweets are collected by SMTAS within seconds of the user tweeting. SMTAS has a real-time mapping module along with a streamer for a researcher to analyze the tweets. In the following example, the tweet was geo-mapped to its location using Google street view and a comparison was done with the picture present in the tweet showing the flooding in the area during Hurricane Sandy. (See Figure 9)

Figure 9: Google Street View and Tweeted Image

 google1 Google2

 

 

“If the ‘tweeters’ have their geo-locator turned on, we can see where groups are tweeting from instantly, and using Google Street View, we can see the actual spot a tweet comes from.”

– Willie Brown

 

8. Buckets: A researcher can use the search module (using complex content matching algorithm) to filter through the tweets a study has collected. Once the search is complete, the researcher can then create a “Bucket” which would contain the filtered tweets. Any such buckets of data can be re-analyzed through the analytical modules of SMTAS. This presents researchers the ability to filter in the appropriate content from a larger dataset.

 

 

 

team

The Social Science Research Center (SSRC) and Mississippi Agricultural and Forestry Experiment Station (MAFES) currently fund the SMTAS project. The project was initiated by Dr. Arthur Cosby (Director of SSRC) and Dr. John Edwards (Director of Survey Research Labs). The team comprises of Dr. Somya Mohanty (Systems Architect and Lead Developer), Jake Gaylor (Lead Programmer) and Josh Richardson (System Administrator).