• Sean Keenan

Putting Together Data

Similar to almost anything you can create, data collection and visualization isn’t a simple plug and play process. Before you can create visualization’s as comprehensive and beautiful as these, you first need to follow the steps that all those visuals follow to make sure you have accurate and meaningful data. To begin this, you obviously need to collect the data itself. This week, I’ve researched and collected with different sets of data, all with different sources and creation methods.

Collected Datasets

For my first set of data, I aimed for a geographical set of data. Maps are a common and effective way of presenting different kinds of data stories, so I felt it was a good place to start. There’s a slew of great data sources out there to collect this kind of data, but for mine I choose to analyze current unemployment statistics. I accessed this data using the U.S. Bureau of Labor and Statistics, which gives a detailed look at not only every country's current unemployment numbers, but counties and metro areas as well. Using this, I copied the 12 month net change in unemployment over the last five years and placed them on five different sheets in one document.

A sample of what the data currently looks like

For the next set of data I looked at, I looked for a reliable set of public source data. As I’m a big fan of video games, I tried to find something related to that and ended up browsing through a multitude of different data sets on Statista. The one that caught my eye the most was this in regards to the DLC market vs. the physical market in games. I was surprised at first, as I knew DLC is growing increasingly more common and expensive in games, but making this much more than physical games themselves didn’t make much sense to me. I then went to check the source (pg. 72) and found that the data on Statista made what I felt was a misinterpretation. DLC is downloaded content that’s added on to a preexisting game, but in the original report the company Capcom meant all digital sales, not just DLC. With that in mind I then recreated the original data but added in what I felt was the proper heading for the data.

How the data looks with some of the revisions I made to it

For the last set of data, I created a dataset from scratch, using online resources to get the data itself. Staying with the topic of video games, I came upon the section of Nintendo’s website which shows their financial data. In this data, Nintendo lists many different numbers, including their top selling games, sales figures on all their consoles, and number of games they’ve released over the past years. Using this, I jotted down each console’s top ten games in sales, general console sales, and also added in the number of sales per game series.

The custom data I jotted down using Nintendo's public sales information

How I can visualize this data

When looking at my first data set, the main questions I feel I could ask is the overall change in unemployment over the five years I tracked, as well as which states had the most fluctuation over that time period. This could be visualized using a map similar to the one used on the site, though I feel a line chart would also be effective. With that type of chart, the sharp up and down angles of it would show the changes well at a glance, though to make sure the visual is trustworthy I’d have to make sure the baseline starts at zero and that the data points it touches upon aren’t unnecessarily large so that they distort the line.

For the second data chart, some of the questions presented could include the level of change from year to year in digital sales vs physical sales, as well as how extreme the changes in digital sales are in more recent years compared to the earlier part of the decade. The easiest way to visualize this data would either be a stacked bar chart or an individual pie chart for each year. On the stacked bar chart, I could also include a line chart to follow the data charts so it’s easier to see at a glance. To make sure the visual is trustworthy, the colors need to be easily distinguishable and each section of the bar/pie chart would need to be even in size when compared to the number. This means not exaggerating any part of the bar chart to make a change seem more dramatic than it is.

For the final chart, the main questions that could be asked from this data includes what game series sell the most no matter the console, the percentage of sales a series has in comparison to the overall software sales number, and the percentage of console owners who have a specific game. The best way to visualize some of this data in my mind would be a clustered bar chart. In this clustered bar, the x-axis would list each series, while the y-axis would list the sales numbers. Each bar would then have a corresponding color for each console. To make sure it’s accurate though, you’d obviously need to keep the baseline at zero, and in regards to series that didn’t have a top 10 game in sales on a console, I would need to have those bars simply not exist so the visual doesn’t get too cluttered and confused.

  • LinkedIn

© 2020 by Sean Keenan.