Will the song be a toot or a boot?

I have a very different taste in music than most people — If I have music playing at all, it is more than likely to be LoFi — which usually doesn’t have words, has a relatively slow tempo, and overall a mellow vibe.

In general, LoFi and more mellow genres of music are not the most popular. Songs that have ‘more going on’, perhaps, tend to be much more popular.

For this project I was interested — can you predict if a song will be popular based on different musical aspects of the song? If so, what aspects of music makeup do popular songs have in common? Is it the speed of the song, the amount of the song filled with lyrics, maybe even the time signature?

I used a dataset found on Kaggle.com called “19,000 Spotify Songs”. This CSV file contained 19000 songs from Spotify and included 15 different features for each song. The different features for each song were as follows:

  • song_name
  • song_popularity
  • song_duration_ms
  • acousticness
  • danceability
  • energy
  • instrumentalness
  • key
  • liveness
  • loudness
  • audio_mode
  • speechiness
  • tempo
  • time_signature
  • audio_valence

Data Engineering

Fortunately, the data came fairly clean. There were no null values in the dataset.

Addressing repeat rows

Initially looking at the data and doing some Pandas Profiling, I noted that there were a number of repeat rows in the data. After removing the repeated rows, I looked at the distribution of the feature ‘song popularity’.

This chart shows the distribution of the song popularity feature. It is skewed to the left.

The chart above actually makes a lot of real-world sense. Most songs on the Spotify platform probably have either very low or average popularity ratings — it is out of the ordinary for a song to be a hit.

Addressing the Tempo feature

I then noticed that the ‘tempo’ column had some values that were 0. This would not make logistical sense to have a song with 0 tempo so I modified all 0 values in the tempo column to NAN values.

I decided that I would create a new feature, ‘popular’, using the song popularity column in order to create a binary classification problem. If a song is rated ≥60 it is considered a ‘popular’ song and assigned a value of True, if it is <60 it is considered ‘not popular’ and assigned a value of False.

After engineering my target feature and cleaning my data, I had to get a baseline. I chose to use accuracy to score. My baseline was approximately 73%. This means that if for every song you guess it will not be popular, you will be right 73% of the time.

After creating this column, I created a heatmap to show the correlation between the features.

You can see in this image that loudness and energy have a high correlation but the target column ‘popular’ does not have much of a correlation with any of the features that would give us an indication of a strong relationship.

Creating Models

I created a Logistic Regression model, a Random Forest Model, and an XGB Classifying model to see if I could predict a song’s chance of popularity using these audio features. I split my data into three sets, a training, validation, and test set. I made sure to drop the features that would contribute to data leakage — the target: popularity, and the original column song_popularity. After training on my train set, surprisingly, my Random Forest model actually outperformed both the logistic AND xgbclassifier with a validation accuracy of 72%.

I then tuned the hyperparameters of my Random Forest model to see if I could improve the score. I used RandomizedSearchCV, which implements a random search over given parameters where each setting is randomly sampled from the parameter values.

I created a parameter dictionary for the model to use and apply while searching for the best parameters for the model.

After implementing the RandomizedSearchCV, with optimal parameters, I was able to increase my validation score to 74% — slightly better than the previous model alone, and slightly better than the baseline.


After creating a model that I was mostly satisfied with, I looked at the feature importances to see which features weighed more into the model than others.

This graph shows the Gini importance for each feature. We can see that instrumentalness has the biggest weight for predicting song popularity with song duration and loudness next, where audio_mode has little to no impact.

To further see how each feature was affecting my models predictive ability, I looked at the permutation importances for each feature.

This graph shows the drop in accuracy that would be seen after permuting the features.

I then created a Partial Dependence interaction plot to show the dependence between the target, ‘popular’, and the features ‘instrumentalness’ and ‘speechiness’.


I can see that the length of a song and how instrumental it may or may not be carrying huge weight in predicting whether or not a song will be popular or not. The model I was able to create barely bypassed the baseline score, but overall, the predictions are made fairly consistent. This could be due to the fact that the majority of my features are numeric. My scores circled my baseline.

In the future I would love to have more data containing information like year and month of release, social events occurring at the time, and possibly geographical locations of where the song is most played. I would love to perform more research and find better ways to predict whether or not a song would be popular on Spotify.

I think the biggest concern I have after doing this project is that artists will realize what works in terms of popularity and abandon unique creations. I do not want the music creators to only create in terms of song popularity! This would make it easier to predict which songs will be deemed ‘popular’ but I am not sure that is a desirable result.

Link to my notebook here.

Hello! I am an aspiring data scientist learning more each day. Above all else, I love the outdoors and doing anything active.