The project was developed with Pouria Faraji and Chiara Spampinato for the Social Network Analysis course at University of Pisa.
The goal of the project is to use graph theory to study the structure of relations between semantically proximate words obtained by Google Trends top keywords of 2019.
In contrast to ordinary networks, in our graph each node represents a word, shaping a semantic network. The network is a directed and unweighted graph, containing 18002 nodes and 70639 edges. The link between words is created if they relate semantically based on Google Trends suggestions.
We mainly carried out the project using Python in Google Colab and Jupyter environment, with the help of the libraries NetworkX, CDlib, NDlib and Gephi to visualize the graph.
We started with crawling of data, followed by basic network analyses consisting of degree distribution, connected components, path, clustering coefficient, density and centrality analyses performed also on artificial graphs. The succeeding work studies community discovery, supervised link prediction, graphlets count estimation and an improved version of link prediction based on word meaning.
The project was developed for the course “Social Network Analysis” in my Digital Humanities Master at University of Pisa with Pouria Faraji and Chiara Spampinato.
View paper ENGLISH: pdf
Project: GitHub repository