The Data-Driven Now That’s What I Call Music! Guide

This post is a collaboration between trendy, very stylish Millennial marketing analyst Nick Patterson and grumpy old GenXer Christopher Penn.

NP: Who remembers the good ole days of taking a trip to the local F.Y.E., snagging the latest and greatest version of the Now That’s What I Call Music! CD, and popping that puppy in your Walkman?

CP: You actually had a Walkman?

NP: Sadly we had to say R.I.P. to the Walkman and CD’s, but luckily Now! is still kicking and dishing out volumes. I have to admit I haven’t listened to a Now! CD since my middle school days, but thought it would be fun to be nostalgic and revisit a childhood favorite by analyzing every Now! volume to date.

CP: I’m amazed you even know what a CD is. That said, studying data and seeing what’s inside is fun.

NP: We found this data set on Kaggle that includes every Now! song to date along with song title, volume number, Spotify ID, as well as scores given to each song based on certain metrics like danceability, loudness, acousticness, energy, etc.

spotify ugly data.png

CP: Interestingly, what’s missing is chart position and/or sales. There’s no way of telling which album did best. Also, when you look at the dataset, one of the things that jumps out right away is how much extra junk is in there. A key part of data science is solid feature engineering – removing stuff that will mess up analysis later on.

NP: Like what?

CP: Do we really need the track ID? It doesn’t lend itself to being useful. Also, look at the Volume number.

NP: What about it?

CP: It’s a number.

NP: Yes… and?

CP: It’s a number, but it’s not a measurement. It’s a dimension. We need to treat it like a dimension, which R calls factors, so that we don’t attempt to use it as a measurement. It isn’t even valuable as an ordinal.

NP: What’s an ordinal?

CP: A way of ranking things. Is album 1 better than album 7? Is album 6 half as good as album 12?

NP: I get it. No, they’re just descriptive.

CP: Right, so we will have to recode these as factors.

CP: Let’s get the dataset loaded up. What are we aiming to find?

NP: I want to find which Now! volume is the happiest, saddest, and best to dance to.

CP: That sounds terrible. I tell you what, I’ll calculate, you dance.

NP: We also linked each song to their Spotify ID so you can instantly listen. Now that’s what I call data! Can’t control your excitement?

CP: I can.

NP: Yeah, we know.

The Correlation Matrix

CP: Let’s start with some basic feature engineering. We agree that track ID isn’t helpful, and that album numbers are factors. Do you think when the album came out has any bearing on the music on it?

NP: Not really. There are probably happy, sad, and danceworthy songs on every album.

CP: Then let’s engineer out album number entirely. What about key?

NP: That’s the key the song is in. It’s mapped to integers.

CP: So that’s also a factor, rather than a number. Same for time signature?

NP: Yep.

CP: What about mode?

NP: That’s the major or minor modality.

CP: Another factor. Amazing how much of the dataset is unclean if you want to do any data science on it. Out of the box, you could really mess up if you didn’t do a lot of feature engineering up front.

NP: So you can’t just give it to an AI and have it tell us?

CP: Nope. Most of great data science is in preparation, planning, and engineering. So we’ve now recoded all those factors disguised as numbers.

feature engineering.png

CP: Let’s put together a correlation matrix. We’ll want to exclude factors for now so we can see relationships just between numeric values.

NP: Okay.

CP: There you go.

raw correlation data.png

NP: That’s a little hard to read.

CP: No it isn’t.

NP: Not everyone reads blocks of data straight up.

CP: I know. Okay, let’s do a heatmap visualization.

music correlation matrix.png

NP: That’s definitely easier to read. Look at energy and loudness. That’s really big.

CP: Yep. Blue is a positive correlation, red is a negative one.

NP: Energy and acousticness have a negative relationship.

CP: So what conclusions could we draw?

NP: Let’s figure out which album has the good energy of energy and loudness to find the Now! volume that can put you in that happy place.

Feeling Good Music

NP: What happy person doesn’t like energetic and loud music? And the definitive all smiles Now volume is…

Now 44

  1. P!nk- Blow Me (One Last Kiss)
  2. Flo Rida- Whistle
  3. Psy- Gangnam Style
  4. Owl City ft. Carly Rae Jepsen- Good Time
  5. Maroon 5- One More Night
  6. Justin Bieber ft. Big Sean- As Long as You Love Me
  7. David Guetta ft. Chris Brown and Lil Wayne- I Can Only Imagine
  8. Nicki Minaj- Pound the Alarm
  9. Karmin- Hello
  10. Chris Brown- Don’t Wake Me Up
  11. Katy Perry- Wide Awake
  12. Kelly Clarkson- Dark Side
  13. Usher- Numb
  14. Swedish House Mafia ft. John Martin- Don’t You Worry Child
  15. Train- 50 Ways to Say Goodbye
  16. Little Big Town- Pontoon
  17. Ryan Star- Stay Awhile
  18. Britt Nicole- Gold
  19. The Ready Set- Give Me Your Hand
  20. Toy Connor- Be in Love Tonight

NP: Head bobber after head bobber, this volume could even put Eeyore in a good mood. We dare anyone to frown listening to Don’t Wake Me Up and to not sing along to One More Night.

CP: …

NP: This volume is filled with tunes to pull you out of a funk and place you on cloud 9.

Sad Songs

CP: Right, moving along. What about the red correlations, the inverse relationships?

NP: High acousticness + low energy is basically the formula to put you right in the feels.

CP: Right in the what?

NP: Based on our analysis this is the Now volume that is supposed to give you a good cry is…

Now 49

  1. Pitbull ft. Kesha- Timber
  2. Lorde- Royals
  3. Miley Cyrus- Wrecking Ball
  4. Katy Perry- Unconditionally
  5. One Direction- Story of My Life
  6. A Great Big World ft. Christina Aguilera- Say Something
  7. Imagine Dragons- Demons
  8. OneRepublic- Counting Stars
  9. Drake ft. Majid Jordan- Hold On, We’re Going Home
  10. Lady Gaga ft. R. Kelly- Do What U Want
  11. Britney Spears- Work Work
  12. Ellie Goulding- Burn
  13. Zedd ft. Hayley Williams- Stay the Night
  14. Justin Timberlake- TKO
  15. Bruno Mars- Gorilla
  16. Luke Bryan- Drink a Beer
  17. ZZ Ward- Last Love Song
  18. Natalia Kills- Trouble
  19. R.L.- Show Me What You Got
  20. Alex Aiono- Doesn’t Get Better
  21. Morning Parade- Alienation

NP: While most of these songs are definitely more on the mellow side, they aren’t really the sad sap-esque (besides the inevitable Drake song of course) songs I was expecting.

CP: Definitely not. My kids listen to some of these and they’re definitely not all sad. Take it a step further and look at the individual songs.

NP: Bring out the Kleenex! Ordinary People by John Legend, a bonafide emotional rollercoaster, takes the top spot with other tear jerkers like Don’t Know Why by Norah Jones and When I Was Your Man by Bruno Mars not far behind. These songs are the Now! songs that are perfect if you’re heart broken, just having a bad day, or if you stepped on a Lego or something.

CP: Or use your brain and just don’t listen to them if you don’t want to be depressed. What else would you like to know?

The Ultimate Dance Volume

NP: Danceability. What songs have the highest danceability?

CP: What is danceability? You know what? Don’t answer that. Let’s just skip to the data.

NP: Let’s find which volume had the songs with the highest average danceability.  For all the wannabe J-Lo’s out there the Now! volume for you is…

Now 52

  1. Jessie J, Ariana Grande and Nicki Minaj- Bang Bang
  2. Meghan Trainor -All About That Bass
  3. Enrique Iglesias featuring Sean Paul, Descemer Bueno and Gente de Zona- Bailando
  4. Nicki Minaj- Anaconda
  5. Iggy Azalea ft. Rita Ora- Black Widow
  6. Katy Perry- This Is How We Do
  7. Clean Bandit ft. Jess Glynne- Rather Be
  8. Ariana Grande ft. Zedd- Break Free
  9. Pitbull ft. John Ryan- Fireball
  10. OneRepublic- Love Runs Out
  11. Maroon 5- Maps
  12. Nico & Vinz- Am I Wrong
  13. Kiesza- Hideaway
  14. Becky G- Shower
  15. 5 Seconds of Summer- Amnesia
  16. Jason Aldean- Burnin’ It Down
  17. Shawn Mendes- Life of the Party
  18. Hilary Duff- All About You
  19. Alex & Sierra- Scarecrow
  20. Eden xo- Too Cool to Dance
  21. Rae Sremmurd- No Flex Zone

NP: Talk about not taking the foot off the gas! Now! 52 has some of the most notable dance songs in recent memory and the song with the highest danceability rating in the data set, Anaconda by Nicki Minaj.

CP: I’ve never wanted to be J-Lo.

NP: You’d look good in —

CP: Stop right there. So, here’s a problem. All this is great if you’re a consumer. However, what if you’re a businessperson? What about popularity, spins, etc. – things that indicate that songs have more than just audio characteristics? What songs are really popular?

NP: Well, Spotify has a popularity score, but it’s not in the Kaggle dataset.

CP: Ah, but it’s in the Spotify API, and we have the ability to query it, so let’s append popularity to the dataset and re-run the correlation matrix.

popularity correlation matrix.png

CP: Isn’t that interesting?

NP: Why isn’t popularity showing up?

CP: It is, but just barely. Here’s the thing. All the numeric values in these songs are attributes of the music itself. What we’re seeing here is, in fact, that the music itself has little to no bearing on the popularity of the songs. That’s an amusing statement, by the way, on the music industry.

NP: So what do we do?

CP: In this case, instead of using simple correlation with only numeric data, we need to take into account all those factors, those non-measurement pieces of data like what volume number, or what key. To do that, we need a different form of classification. Let’s use a random forest.

NP: A random what?

CP: A random forest is a series of decision trees – ways to sort data – that’s good at figuring out what really drives a variable. We want to know what drives popularity, taking into account non-numeric data. The trick is that R can only handle so many factors, so we’ll need to turn the volume numbers into bins of 2 at a time to see if that has an impact using R’s cut function.

random forest music.png

CP: There’s the answer. The volume number – which is a proxy for age, because it’s ordinal – is a strong predictor of the song’s popularity. So, what did we learn today?

NP: The biggest takeaway is that you can’t just make charts with data. You have to think about what the data is, what you want to learn, and spend a lot more time refining it before you start doing analysis.

CP: Right. Now get off my lawn.

NP: Check out all of our findings on Tableau Public which includes the graphs in this post, as well additional findings like the happiest song and highest danceabillity song rankings. Love the Now! volumes? Let us know your all-time favorite in the comments or tweet us at @SHIFTcomm!

Christopher Penn
Vice President, Marketing Technology

Nick Patterson
Marketing Analyst

[cta]

Keep in Touch

Want fresh perspective on communications trends & strategy? Sign up for the SHIFT/ahead newsletter.

Ready to shift ahead?

Let's talk