“Unless you try to do something
beyond what you have already mastered,
you will never grow” - Ralph Waldo Emerson



Pile of Books about Data Science
Figure 1 - Photo of the books on Data Science.


An Intro


Last week, I listened to an interview on the podcast People I Mostly Admire. The host, Steven Levitt, interviewed Angela Duckworth, author ofGrit: The Power of Passion and Perseverance. As always, Duckworth was incredibly inspiring. Her interview involved exploring her theory of goal hierarchy. Due to that inspiration, I decided to try the SuperDataScience 99 day challenge.

I am taking the 99-day challenge to learn more about a subject I am passionate about and fuel a top-level goal of understanding how individuals access information and how best to provide that access. This high-level goal encompasses both my Records Management job, my education, and my hobbies.

I first encountered data science in my Statistics and Political Analysis class in 2018 at Johns Hopkins University, mainly because we used RStudio. I had some data literacy from work, which utilized a Microsoft Access database. Not only were staff creating data on day-to-day operations, but we were increasingly delving into historical data. Annual police reports, parking surveys, and building information provided insights into how data was collected historically and how we can utilize it today to inform operations.

Pile of Books about Data Science
Figure 2 - Diagram Showing Water Pumped at City Pumping Station, City of Grand Rapids Archives and Records Center, Grand Rapids, MI.

Data visualizations from the late nineteenth/early twentieth centuries like the one above captivated me. They are artwork, prepared deliberately, but also functionally. We can use them to tell a compelling story. With a history background, I wanted to learn the skills to tell those stories better.

The first week of the challenge involved learning more about the data science field generally. One thing that stuck out to me was how important having a data science mindset is when confronting problems in other domains. A recent podcast episode on Not So Standard Deviations discussed an observed increase of people in disparate fields gaining data science skills and applying them narrowly for their subject. Also, in the first week, I picked up my copy of Confident Data Skills again and began working my way through it. After reading it over a year ago, I was surprised by how many new insights I found by revisiting it.

Lastly, I closed out the week doing some ancillary tinkering with the Treemap package in R. I worked with OpenData from the City of Grand Rapids, MI, to create the below treemap. Exploring a dataset of demolished buildings from 2011 to 2018, I wanted to visualize which neighborhood experienced the most buildings destroyed in that timeframe. The treemap showcases values through a proportional rectangle, with the size representing variable values. The Treemap package proved fun to work with, and I look forward to tinkering with it more.

# imports the dataset from the CSV File
DemolishedProps <- read.csv(file = "/Users/matthewellis/Downloads/Demolished_Property_Map.csv", header=TRUE, fill=TRUE)

#Loads Libraries
library(igraph)
library(treemap)
library(RColorBrewer)

# Creates a new dataframe focused on the neighborhood variable. Number of Rows stayed the same.
DemolishedPropsNeighborhood <- table(DemolishedProps$Neighborho)
DemolishedPropsNeighborhood <- as.data.frame(DemolishedPropsNeighborhood)
# Creates the treemap
treemap(DemolishedPropsNeighborhood,
        index="Var1",
        vSize="Freq",
        type="index",
        palette="Pastel2",
)

This post was created using R Markdown, the Treemap Package, the Igraph package, and the RColorBrewer Package. All view are my own.