EIJ13: Nerd-Free Zone: Data Crunching and Visualization for Journalism and English Majors
“How many journalists got into journalism because you like math?” Doug Haddix, the assistant vice-president of Editorial Communications at Ohio State University, asked a room full nervous of journalists on Aug. 24 at the Society of Professional Journalists’ Excellence in Journalism 2013 convention in Anaheim, California. “Me too!”
Actually, Haddix, a self-described nerd, was there to teach convention attendees how to use Google Fusion Tables to tackle math and other data analysis in a nearly painless fashion. (Google Fusion Tables typically cleanly import Microsoft Excel spreadsheets and the software has the virtue of being free.)
“Following the money leads to great stories,” he said. “Even if you don’t like math, the spreadsheet helps us get over that.”
Google Fusion Table spreadsheets automate nearly all of the calculations, and allow them to be done in bulk and allow calculations to work off of the results of other calculations.
Other examples of data sets ripe for this sort of analysis include restaurant health code ratings, government contracts or budgets and various education data-dumps.
The first set of data looked at on Saturday were the 2012 salaries of Major League Baseball players, originally obtained by Investigative Reporters & Editors.
“Lots of stories lurking there in payroll data sets,” Haddix said.
After opening the spreadsheet, he resorted the table several times, by salary, looking for obvious errors made by those who compiled the data.
“Any new data set I get, I like to sort it top to bottom, bottom to top,” he said. “It’s kind of like interviewing a data set.”
Back in Ohio, he’s used the technique to spot incorrect ages inputted for local bus drivers, including alleged five-year-old and 100 year old bus drivers.
“Sorting (spreadsheet data) is really easy — does everyone agree? But it’s very powerful.”
Moving beyond just sorting, Haddix showed off the use of the martini glass-shaped filter icon.
“A filter just goes into, like, 800 players … and (lets users) filter out just what you want to see. Really good for calling out examples.”
Haddix suggested making specialized searches into their own tabs in a spreadsheet, for later use. (He also recommended creating a tab giving the name and Web URL of the source of any data used, just to keep it on hand.)
More advanced inquiries require the use of Pivot Tables.
“Several of you are asking ‘how do we do this, how do we do that,’ but in every case, the answer is ‘pivot tables,'” Haddix said.
Pivot tables “are a very sophisticated type of analysis that’s very easy to do in Google,” much harder to do in Excel, he said.
Pivot tables can do things like sums, averages, medians, all available from a pop-up table in Google Fusion Tables.
“Navigation is, like, the hardest part of” Google Fusion Tables, Haddix said. From there, it’s all downhill.
“Are the Yankees as evil as we think? How much do they pay their players?”
Haddix guided the panel attendees through creating a formula (cells are given Battleship-style coordinates and used as variables in a formula, such as “=D2-C2”) to add up all the salaries of the Yankees.
(Users can also add commas to long numbers in Google Fusion Tables: Click on the column in question, then go up to the Format menu.)
“You can look at ‘what is the median salary of pitchers in the American League vs. the National League,'” a “double-whammy,” of two separate formulas, one for each league.
“The key thing is to make sure your (spreadsheet) formula is always right,” especially before duplicating it across an entire set of data, rather than just one initial example. (The formula in Google Fusion Tables can be made visible by clicking on the answer and looking up and to the left to where the formula’s shown.) Haddix talks through his formulas out loud, to make sure the logic is right.
After double-checking the formula’s logic, it can be mass-applied to all the rows below the first cell by hovering over the bottom right corner of the margin, until the cursor becomes a black cross, and then left-clicking.
“Boom: data ninja!”
Finally, Haddix took data on Americans’ adjusted gross income and showed off Google Fusion Tables’ ability to create charts on the fly.
After clicking on a cell with data, he went to the “Insert” menu at the top of the screen and chose “Chart.” Several sample charts derived from the data instantly popped up, eliciting a “wow” from one member of the audience.
“You can do a lot of sophisticated custom works with these (Google Fusion) charts.”
Haddix dragged the chart off the table (which it was now obscuring) by clicking the icon on the top right and clicking “move to own sheet.”
The data visualizations possible with Google Fusion Tables include pie charts, graphs, scatter graphs and much more.
The program will also do mapping directly from a spreadsheet, addresses have to be all in one cell, rather than having address, city, state, etc. separated.
“A lot of government data already has the latitude and longitude in it. Those are really easy to produce maps of,” Haddix said.
And although the mapping functions will work without a ZIP code, in large metro areas, there’s the possibility of similar street names, and a ZIP code helps ensure accuracy.
All of Haddix’ materials for the session are available online at Go.OSU.edu/FusionTables. The “Fusion Tables Kip hands on” file is the 16-page script for Saturday’s three-hour panel. (Use of the files requires Google Drive access, which is automatic and free with a Gmail account.)
No Comments »
No comments yet.
RSS feed for comments on this post.
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>