Data reporter Emmanuel Martinez in our "Hello, it's Reveal" music video. Credit: Reveal

Despite the push for government dashboards and open data portals, much of the data we use at Reveal still must be obtained through public records requests. Summary data on government websites often comes from more granular and detailed databases.

But asking for that data can be a challenge because of the technical issues you might face. The first time I requested data from a government agency, I had no idea where to begin. How would I get the appropriate dataset for my story? How could I avoid getting summary or aggregate data or data in PDFs?

Here are some things I learned along the way.

Start the dialogue

Before I begin writing a public records request, I try to have a conversation with someone at the agency who understands the data. I find out the name of the database, how the agency stores and maintains its data and what is available to the public. All this information bolsters an open records request and makes the process smoother. The more you know before writing and submitting your request, the better.

Know the law

Open records laws treat data differently. The federal Freedom of Information Act covers electronic records. Access to data varies among states. The more you understand the law, the better your request will be.

In Arkansas and Tennessee, for example, requesters must be state residents. In Kentucky, requesters may be required to share with an agency if they are using public records for a commercial purpose and may be charged a higher fee as a result. Texas requires public records requests to be submitted in writing.

Knowing the nuances of the law also can help you debunk the reasons agencies might use to deny your request.

The Reporters Committee for Freedom of the Press has guides on how to file public record requests, including information on the federal law and state open records laws for all 50 states and the District of Columbia.

Many states have open-government organizations that seek to enhance access to public information. These groups often provide resources for requesting records. A list of them is on the National Freedom of Information Coalition website.

If you’re trying to find out what data an agency keeps, get a copy of its retention schedule. In most states, that will give you a list of what documents and data an agency keeps and how long they must be kept. Agency forms can be a map to what’s kept in a database because the information on them usually ends up in a database.

If your request is denied, appeal and be willing to negotiate.

If that doesn’t work, you might have to resort to more drastic measures. Reveal reporters scanned hundreds of day care inspection documents after the California Department of Social Services said it would cost more than $20,000 and take more than two years to make its day care database publicly available.

Request data in database format

Another headache for requesters comes when agencies release data in a PDF format, which basically is an electronic printout. Although tools like Tabula and Cometdocs can extract data from PDFs, it’s not a painless task. To avoid PDFs, request the information in a raw data format such as a CSV file or in a common format such as a .dbf or Excel file. (Your earlier conversation with agency officials probably will inform what you request.)

You also need to know the limitations of each format. Excel can’t handle more than a million records. Access files can be opened only with Microsoft Access. Often, the best option is to request data in a CSV or text file, which can be imported into any database manager or spreadsheet program. If an agency pushes back, saying it cannot create data, the best rebuttal is to ask for the database in whatever format the agency stores its data.

Request records layouts, too

Data is meaningless if you don’t know how to interpret it. Column headers can have obscure names that mean something only to the agency’s database manager. Here’s part of Montana’s workers’ compensation claims database:

I would have no idea what the headers mean without documentation. In my requests, I always ask for any record layouts, data dictionaries, code sheets or lookup tables associated with the dataset. Those documents help you translate the database into English by telling you what each column means, the data type of each column and how multiple tables work in relation to each other.

Negotiating fees

Data can be expensive. The hours of programming, research and analysis can add up to an exorbitant amount of money. But remember that in most cases, you should pay only what it costs to reproduce the records. If you’re requesting data for public interest, ask for a fee waiver. Not all states include a provision for fee waivers in their open records laws, however. Kentucky and New Jersey are two examples.

If a fee waiver is not an option or if it’s denied, ask for an itemized breakdown of the charges. Some state laws are quite specific about what can be charged. Find out what the agency means when it says it’s charging you for programming, research or analysis. If the agency does need to do programming, request the code used to produce the data. That way, if you need to request the same information in the future, you can give the agency a copy of the program.

After you get the data

Maintain a good working relationship with the agency throughout the process because the conversation doesn’t end once you get the data. I’ve had questions that only the agency’s database manager could answer, such as how many records the dataset should have or what to do about duplicate rows.

It’s better to get answers to those questions than make assumptions that later could undermine your analysis.

Frustrated?

Wrangling records got you down? You’re not alone. Reveal staffers made a music video to express our relationship with public records.

Emmanuel Martinez can be reached at emartinez@cironline.org. Follow him on Twitter: @eman_thedataman.

Emmanuel Martinez

Emmanuel Martinez is a data reporter for Reveal. A graduate of UC Irvine, Martinez received his master’s degree from the University of Southern California, where he studied radio and data journalism. Prior to joining Reveal, he interned for KPCC, the Los Angeles NPR affiliate, where he helped reporters acquire, clean and analyze data. Martinez is based in Reveal’s Emeryville, California, office.