This post is a collaboration between trendy, very stylish Millennial marketing analyst Nick Patterson and grumpy old GenXer Christopher Penn.
NP: Who remembers the good ole days of taking a trip to the local F.Y.E., snagging the latest and greatest version of the Now That’s What I Call Music! CD, and popping that puppy in your Walkman?
CP: You actually had a Walkman?
NP: Sadly we had to say R.I.P. to the Walkman and CD’s, but luckily Now! is still kicking and dishing out volumes. I have to admit I haven’t listened to a Now! CD since my middle school days, but thought it would be fun to be nostalgic and revisit a childhood favorite by analyzing every Now! volume to date.
CP: I’m amazed you even know what a CD is. That said, studying data and seeing what’s inside is fun.
NP: We found this data set on Kaggle that includes every Now! song to date along with song title, volume number, Spotify ID, as well as scores given to each song based on certain metrics like danceability, loudness, acousticness, energy, etc.
CP: Interestingly, what’s missing is chart position and/or sales. There’s no way of telling which album did best. Also, when you look at the dataset, one of the things that jumps out right away is how much extra junk is in there. A key part of data science is solid feature engineering – removing stuff that will mess up analysis later on.
NP: Like what?
CP: Do we really need the track ID? It doesn’t lend itself to being useful. Also, look at the Volume number.
NP: What about it?
CP: It’s a number.
NP: Yes… and?
CP: It’s a number, but it’s not a measurement. It’s a dimension. We need to treat it like a dimension, which R calls factors, so that we don’t attempt to use it as a measurement. It isn’t even valuable as an ordinal.
NP: What’s an ordinal?
CP: A way of ranking things. Is album 1 better than album 7? Is album 6 half as good as album 12?
NP: I get it. No, they’re just descriptive.
CP: Right, so we will have to recode these as factors.
CP: Let’s get the dataset loaded up. What are we aiming to find?
NP: I want to find which Now! volume is the happiest, saddest, and best to dance to.
CP: That sounds terrible. I tell you what, I’ll calculate, you dance.
NP: We also linked each song to their Spotify ID so you can instantly listen. Now that’s what I call data! Can’t control your excitement?
CP: I can.
NP: Yeah, we know.
The Correlation Matrix
CP: Let’s start with some basic feature engineering. We agree that track ID isn’t helpful, and that album numbers are factors. Do you think when the album came out has any bearing on the music on it?
NP: Not really. There are probably happy, sad, and danceworthy songs on every album.
CP: Then let’s engineer out album number entirely. What about key?
NP: That’s the key the song is in. It’s mapped to integers.
CP: So that’s also a factor, rather than a number. Same for time signature?
CP: What about mode?
NP: That’s the major or minor modality.
CP: Another factor. Amazing how much of the dataset is unclean if you want to do any data science on it. Out of the box, you could really mess up if you didn’t do a lot of feature engineering up front.
NP: So you can’t just give it to an AI and have it tell us?
CP: Nope. Most of great data science is in preparation, planning, and engineering. So we’ve now recoded all those factors disguised as numbers.
CP: Let’s put together a correlation matrix. We’ll want to exclude factors for now so we can see relationships just between numeric values.
CP: There you go.
NP: That’s a little hard to read.
CP: No it isn’t.
NP: Not everyone reads blocks of data straight up.
CP: I know. Okay, let’s do a heatmap visualization.
NP: That’s definitely easier to read. Look at energy and loudness. That’s really big.
CP: Yep. Blue is a positive correlation, red is a negative one.
NP: Energy and acousticness have a negative relationship.
CP: So what conclusions could we draw?
NP: Let’s figure out which album has the good energy of energy and loudness to find the Now! volume that can put you in that happy place.
Feeling Good Music
NP: What happy person doesn’t like energetic and loud music? And the definitive all smiles Now volume is…
Toy Connor- Be in Love Tonight
NP: This volume is filled with tunes to pull you out of a funk and place you on cloud 9.
CP: Right, moving along. What about the red correlations, the inverse relationships?
NP: High acousticness + low energy is basically the formula to put you right in the feels.
CP: Right in the what?
NP: Based on our analysis this is the Now volume that is supposed to give you a good cry is…
NP: While most of these songs are definitely more on the mellow side, they aren’t really the sad sap-esque (besides the inevitable Drake song of course) songs I was expecting.
CP: Definitely not. My kids listen to some of these and they’re definitely not all sad. Take it a step further and look at the individual songs.
NP: Bring out the Kleenex! Ordinary People by John Legend, a bonafide emotional rollercoaster, takes the top spot with other tear jerkers like Don’t Know Why by Norah Jones and When I Was Your Man by Bruno Mars not far behind. These songs are the Now! songs that are perfect if you’re heart broken, just having a bad day, or if you stepped on a Lego or something.
CP: Or use your brain and just don’t listen to them if you don’t want to be depressed. What else would you like to know?
The Ultimate Dance Volume
NP: Danceability. What songs have the highest danceability?
CP: What is danceability? You know what? Don’t answer that. Let’s just skip to the data.
NP: Let’s find which volume had the songs with the highest average danceability. For all the wannabe J-Lo’s out there the Now! volume for you is…
NP: Talk about not taking the foot off the gas! Now! 52 has some of the most notable dance songs in recent memory and the song with the highest danceability rating in the data set, Anaconda by Nicki Minaj.
CP: I’ve never wanted to be J-Lo.
NP: You’d look good in —
CP: Stop right there. So, here’s a problem. All this is great if you’re a consumer. However, what if you’re a businessperson? What about popularity, spins, etc. – things that indicate that songs have more than just audio characteristics? What songs are really popular?
NP: Well, Spotify has a popularity score, but it’s not in the Kaggle dataset.
CP: Ah, but it’s in the Spotify API, and we have the ability to query it, so let’s append popularity to the dataset and re-run the correlation matrix.
CP: Isn’t that interesting?
NP: Why isn’t popularity showing up?
CP: It is, but just barely. Here’s the thing. All the numeric values in these songs are attributes of the music itself. What we’re seeing here is, in fact, that the music itself has little to no bearing on the popularity of the songs. That’s an amusing statement, by the way, on the music industry.
NP: So what do we do?
CP: In this case, instead of using simple correlation with only numeric data, we need to take into account all those factors, those non-measurement pieces of data like what volume number, or what key. To do that, we need a different form of classification. Let’s use a random forest.
NP: A random what?
CP: A random forest is a series of decision trees – ways to sort data – that’s good at figuring out what really drives a variable. We want to know what drives popularity, taking into account non-numeric data. The trick is that R can only handle so many factors, so we’ll need to turn the volume numbers into bins of 2 at a time to see if that has an impact using R’s cut function.
CP: There’s the answer. The volume number – which is a proxy for age, because it’s ordinal – is a strong predictor of the song’s popularity. So, what did we learn today?
NP: The biggest takeaway is that you can’t just make charts with data. You have to think about what the data is, what you want to learn, and spend a lot more time refining it before you start doing analysis.
CP: Right. Now get off my lawn.
NP: Check out all of our findings on Tableau Public which includes the graphs in this post, as well additional findings like the happiest song and highest danceabillity song rankings. Love the Now! volumes? Let us know your all-time favorite in the comments or tweet us at @SHIFTcomm!
Vice President, Marketing Technology