The project, codeveloped with Valerio Bonsignori and Federica Currao, is the product of Data Mining course at University of Pisa.
The data under investigation come from Carvana’s resale’s cars process and describe the characteristics of each vehicle bought at auction by the dealer. The goal is to predict whether the car purchased is or not a bad buy to decrease Carvana’s loss. This last information is directly defined with a binary feature in the training dataset and therefore it’s part of the process evaluate and understand which features are the most discriminating in the choice.
It required to perform different tasks: data understanding, missing values and outliers detection, clustering analysis (K-Means, DBSCAN, Hierarchical), frequent pattern and association rules extraction, Decision Tree.
The paper was written for the course “Data Mining 1” in my Digital Humanities Master at University of Pisa with Valerio Bonsignori and Federica Currao.
View paper ENGLISH: pdf