Different visualization tools use different data formats, and the structure you use varies by the story you want to tell. So the more flexible you are with the structure of your data, the more possibilities you can gain. Make use of data formatting applications, and couple that with a little bit of programming know how, and you can get your data in any format you want to fit your specific needs.
The easy way of course is to find a programmer who can format and parse all of your data, but you’ll always be waiting on someone. This is especially evident during the early stages of any project where iteration and data exploration are
key in designing a useful visualization. Honestly, if I were in a hiring position, I’d likely just get the person who knows how to work with data, over the one who needs help at the beginning of every project. Various data formats, the tools available to deal with these formats, and finally, some programming, using the same logic you used to scrape data in the previous example are described next.
Data Formats
Most people are used to working with data in Excel. This is fine if you’re going to do everything from analyses to visualization in the program, but if you want to step beyond that, you need to familiarize yourself with other data formats. The point of these formats is to make your data machine-readable, or in other words, to structure your data in a way that a computer can understand. Which
data format you use can change by visualization tool and purpose, but the three following formats can cover most of your bases: delimited text, JavaScript Object Notation, and Extensible Markup Language.
Delimited Text
Most people are familiar with delimited text. You did after all just make a comma-delimited text file in your data scraping example. If you think of a dataset in the context of rows and columns, a delimited text file splits columns by a delimiter. The delimiter is a comma in a comma-delimited file. The delimiter might also be a tab. It can be spaces, semicolons, colons, slashes, or whatever you want; although a comma and tab are the most common. Delimited text is widely used and can be read into most spreadsheet programs such as Excel or Google Documents. You can also export spreadsheets as delimited text. If multiple sheets are in your workbook, you usually have
multiple delimited files, unless you specify otherwise. This format is also good for sharing data with others because it doesn’t depend
on any particular program.
JavaScript Object Notation (JSON)
This is a common format offered by web APIs. It’s designed to be both machine and human-readable; although, if you have a lot of it in front of you, it’ll probably make you cross-eyed if you stare at it too long. It’s based on JavaScript notation, but it’s not dependent on the language. There are a lot of specifications for JSON, but you can get by for the most part with just the basics. JSON works with keywords and values, and treats items like objects. If you were to convert JSON data to comma-separated values (CSV), each object might be a row. As you can see later in this book, a number of applications, languages, and libraries accept JSON as input. If you plan to design data graphics for the web, you’re likely to run into this format. XML is another popular format on the web, often used to transfer data via APIs. There are lots of different types and specifications for XML, but at the most basic level, it is a text document with values enclosed by tags. For example, the Really Simple Syndication (RSS) feed that people use to subscribe to blogs, such as Flowing Data, is actually an XML file. The RSS lists recently published items enclosed in t h e tag, and each item has a title, description, author, and publish date, along with some other attributes.