Laura Drummer, Director of Software & Engineering, Novetta Solutions

Laura Drummer

Director of Software & Engineering, Novetta Solutions

Natural Language Processing and Topic Modeling for Social Network Analysis

Traditional social network analysis is performed on a series of nodes and edges, generally gleaned from metadata about interactions between several actors. In the intelligence and law enforcement communities, this metadata can frequently be paired with data and communications content.

Our analytic, SocialBee, takes advantage of this widely untapped data source to not only perform more in-depth social network analysis based on actor behavior, but also enrich the social network analysis with topic modelling, sentiment analysis, and trending over time.

Through extraction and analysis of topic-enriched links, SocialBee has also been able to successfully predict “hidden relationships” i.e., relationships not seen in the original dataset, but that exist in an external dataset via different means of communication.

The clustering of communities based on behavior over time can be done by looking purely at metadata, but SocialBee also analyzes the content of communications which will allows for a richer analysis of the tone, topic, and sentiment of each interaction. Traditional topic modelling is usually done using natural language processing to build clusters of similar words and phrases.

By incorporating these topics into a communications network stored in Neo4j, we are able to ask much more meaningful questions about the nature of individuals, relationships, and entire communities.

Using its topic modelling features, SocialBee can identify behavior based communities within this networks. These communities are based on relationships where a significant percentage of the communications are about a specific topic. In these smaller networks, it is much easier to identify influential nodes for a specific topic, and find disconnected nodes in a community.

This talk explores the schema designed to store this data in Neo4j, which is loosely based on the concept of the “Author-Recipient-Topic” model as well as several advanced queries exploring the nature of relationships, characterizing sub-graphs, and exploring the words that make up the topics themselves.

About

Laura Drummer has over 14 years of experience in intelligence analysis, data analytics, and software development, and serves as the Director of Software and Engineering in Novetta's Cyber Operations Division. She holds a MS in Information Systems and a BA in Mandarin Chinese. Laura lives in Maryland with her husband and two adorable dogs and is expecting her first baby in November of this year.