The project was developed with Pouria Faraji, Selam Mulatu and Thizirie Ould Amer.
It regards the analysis of two TED Talks datasets, aiming to detect the characteristics of the most popular talks to produce a recommendation system. To do this, we created a model capable of predicting video popularity prior to its publication, helping us understand what factors contribute to its success. The main dataset contains general details about the talks and the speakers, while the second one consists of the transcripts of the talks.
It was developed in Jupyter notebook and described in a paper.
In the paper we lined the motivation behind the project and explored the state of the art. As first step we studied in detail the data available, so we performed a data understanding process followed by feature engineering where we exploited a new dataset, Google Trends, useful to discover the popularity of both speaker and topic. We then studied variables distribution, correlations and outliers detection. An important challenge of our prediction task is that we have far more low popular videos than high popular ones, resulting in an imbalanced situation. The solution was to transform the problem from regression to classification. We then proposed different classification models and we opted for the best trade-off between performance and interpretability, resulting in a Decision Tree. After that, we analyzed some feature and how these related with the classification results. We found out that the funniness of a speech, also with speaker occupation and duration are very important features to consider when organizing a talk. Finally, we delined the recommendation system and explored few cases explaining the results.
The work has been drawn up in three months, with periodic submissions. The paper was writtend step by step, according to the various deadlines. For each mid term, we had to present the results, allowing us to develop also public speaking skills.
The project was developed for the course “Big Data Analytics” in my Digital Humanities Master at University of Pisa with Pouria Faraji, Selam Mulatu and Thizirie Ould Amer.
View paper ENGLISH: pdf
Project: GitHub repository