Alex Cookson

Applying PCA to fictional character personalities

In this post, we’re going to apply Principal Component Analysis (PCA) to a dataset of fictional character personalities. PCA is a common technique for dimensionality reduction, which you might want to do if you are, say, trying to put together a classification model and you have a dataset with a lot of variables. The dataset we’re using is of crowdsourced scores of personality traits for 800 fictional characters from books/movies/TV shows like Game of Thrones, Pride and Prejudice, and The Lion King.

Building an animation step-by-step with gganimate

Getting started with with {gganimate} is tough. There’s a big set of new functions and behaviours to learn. And the path from idea to polished animation – if you’re like me – is riddled with dead-ends, error messages, and exclamations of “Why is it doing that?!” In this post, I want to be your {gganimate} guide and take you down one possible path that starts with an idea and ends with something beautiful.

Normalizing and rescaling children's book ratings (2 of 2)

Note: this is the second part of a two-post series where I “fix” some of the problems with crowd-sourced ratings, like those you find for movies or books. (In this series, I look at children’s books.) In the first part, I incorporated a Bayesian prior into the rating calculation to address books with very few ratings sometimes having extreme scores (like 5 out of 5 stars) that likely don’t reflect their actual quality.

Rating children's books with empirical Bayes estimation (1 of 2)

Ratings sites – like Rotten Tomatoes and IMDb for movies or Goodreads for books – are annoying. They each seem to have their norms where the same rating means different things on different sites. A rating of 60% on one site might be good, but 6/10 (equivalent to 60%) on another site might be terrible. So you need to do some extra mental work to set your expectations based on the specific site you’re on.

What can we learn from a country's diplomatic gifts?

Have you ever brought a bottle of wine, flowers, or chocolate babka to a dinner party as a host/hostess gift? Or brought home a souvenir for your parents, partner, or kids after you’ve been travelling – like chocolate from Switzerland or, uh… Brazil nuts from Brazil? Countries do the same thing, kind of. Diplomatic gifts are often exchanged when dignitaries travel abroad or receive visitors. They can be lavish, like a $780,000 emerald and diamond jewellery set, given by King Abdullah of Saudi Arabia.

What's the most successful Broadway show of all time?

I love musicals! Who doesn’t?! That feeling when the lits dim at the beginning of the show. The intermission conversation (post-bathroom!) of which songs you enjoyed the most. Spending the rest of the week (maybe month?) humming your favourites to the annoyance of everyone around you. What’s that? Les Misérables is obviously the best musical? I know, I know. I mean, Hamilton is good and all that, and it deserves praise, but it’s no Les Mis (don’t @ me).

How dangerous is climbing Mount Everest?

In this series of posts, we will analyze climbing expeditions to the Himalayas, a mountain range comprising over 50 mountains, including Mount Everest, the tallest mountain in the world. This is Part 2 of a two-part series: Part 1 looked at Himalayan peaks and their first ascents Part 2 (this post) looks at Everest expeditions This post will focus on expeditions to Mount Everest, the most famous Himalayan peak and the tallest mountain in the world.

Analyzing Himalayan peaks and first ascents

In this series of posts, we will analyze climbing expeditions to the Himalayas, a mountain range comprising over 50 mountains, including Mount Everest, the tallest mountain in the world. This is Part 1 of a two-part series: Part 1 (this post) looks at Himalayan peaks and their first ascents Part 2 looks at how dangerous it is to climb Everest This post will focus on getting an overview of the Himalayan peaks, especially their height, whether they’ve been summitted, and (if it applies) when the first ascent was and who was involved.

Mapping San Francisco's trees

In this post, I create some basic geographical maps using the San Francisco Trees dataset from TidyTuesday, a project that shares a new dataset each wee to give R users a way to apply and practice their skills. Getting started with geographical mapping in R can be daunting because there is a lot of terminology to describe a lot of methods that are specific to mapping. There is a whole discipline – Geographic Information Systems – dedicated to this stuff, so it’s no surprise that it can get complicated fast.

Heat mapping the timing of Philadelphia parking tickets

In this post, I create heat maps using the Philly Parking Tickets dataset from TidyTuesday, a project that shares a new dataset each week to give R users a way to apply and practice their skills. Specifically, we’ll cover: Cleaning and aggregating the data that will go into our heat map Creating a basic heat map with ggplot2 defaults Tweaking ggplot2 theme components to get a much prettier heat map