Tutorial · Alex Cookson

Mapping San Francisco's trees

Jan 29, 2020

TidyTuesday / geographic maps / small multiples / tutorial

In this post, I create some basic geographical maps using the San Francisco Trees dataset from TidyTuesday, a project that shares a new dataset each wee to give R users a way to apply and practice their skills. Getting started with geographical mapping in R can be daunting because there is a lot of terminology to describe a lot of methods that are specific to mapping. There is a whole discipline – Geographic Information Systems – dedicated to this stuff, so it’s no surprise that it can get complicated fast.

Heat mapping the timing of Philadelphia parking tickets

Dec 5, 2019

Tidy Tuesday / heat maps / tutorial

In this post, I create heat maps using the Philly Parking Tickets dataset from TidyTuesday, a project that shares a new dataset each week to give R users a way to apply and practice their skills. Specifically, we’ll cover: Cleaning and aggregating the data that will go into our heat map Creating a basic heat map with ggplot2 defaults Tweaking ggplot2 theme components to get a much prettier heat map

Predicting horror movie ratings with LASSO regression

Oct 21, 2019

TidyTuesday / LASSO regression / tutorial

In this post, I look at the Horror movie ratings dataset from TidyTuesday, a project that shares a new dataset each week to give R users a way to apply and practice their skills. We’re going to run a LASSO regression, a type of regularization. Regularization is often used when you have lots of predictors (compared to your number of observations) or when your data has multi-collinearity – predictors that are highly correlated with one another.

How much can professional powerlifters bench press?

Oct 8, 2019

TidyTuesday / linear regression / splines / tutorial

In this post, I analyze the Powerlifting dataset from TidyTuesday, a project that shares a new dataset each week to give R users a way to apply and practice their skills. This week’s data is about the results of powerlifting events that are part of the International Powerlifting Federation. I will be predicting bench press weight with a multiple linear regression model. What’s more, I will be using natural cubic splines to incorporate non-linear trends into our model.

What are New York's best and worst pizza restaurants?

Sep 30, 2019

TidyTuesday / bar graphs / small multiples / tutorial

In this post, I analyze the Pizza Party dataset from TidyTuesday, a project that shares a new dataset each week to give R users a way to apply and practice their skills. This week’s data is about survey ratings of New York pizza restaurants. Setup First, let’s load the tidyverse, change our default ggplot2 theme, and load the data. (I named the dataframe pizza_barstool_raw because I’ll probably add some cleaning steps and I like to have the original data on hand.

Finding trends in US national park visits

Sep 16, 2019

TidyTuesday / line graphs / tutorial

In this post, I analyze the National Park Visits dataset from TidyTuesday, a project that shares a new dataset each week to give R users a way to apply and practice their skills. This week’s data is about visitor numbers for US National Parks, going way back to 1904, when there were only six national parks. I’ve never been to a US national park, but I know about some of the famous ones like Yosemite and Yellowstone.