The Media, the FOIA and Electronic Data

Drew Sullivan
The Associated Press
(212)621-7639
drew@ap.org

 

The media and electronic information

Reporters are information brokers who are paid to reformulate information - say as a newspaper story or a TV package. While much of the information they repackage comes from observation and interviews they gather, a very important component comes from public record sources. So it is no surprise that as paper records in county, state and federal offices disappear and are replaced by their electronic equivalents such as databases, reporters are going to come looking for these records.

To effectively use information in electronic form requires some skills not always found in the reporters toolbox. Database programs, spreadsheets and other means of manipulating electronic data were once the domain of the geeks who worked down in the basement. But times have changed. Easy to use, powerful programs and the computers needed to run them are now cheap and plentiful. No longer do you need a degree in database theory to build your own database, analyze the data and spit out useful and interesting results.

This hasn't been lost on reporters. In fact, they've taken it to heart with zeal seldom seen in the industry. They've given this area of interest its own name: computer-assisted reporting or CAR for short. CAR is the hottest thing in journalism today. CAR-based stories have figured large in national media awards winning Pulitzer Prizes in each of the last seven years. CAR specialists (often called database editors) even command higher salaries than other reporters in the newsroom. Three years ago a non-profit organization was formed called the National Institute for Computer-Assisted Reporting (NICAR). It has trained more than 10,000 reporters in the use of these tools. As Elliot Jaspin, one of the pioneers of CAR once said: "A reporter who can't read a magnetic tape is as illiterate as the 15th Century peasant confronted by Gutenberg."

The benefits of using CAR are significant -- you can do better stories cheaper and faster. Ten years ago, if you wanted to find all the dead voters in East St. Louis, you would have to do it anecdotally. Using CAR, reporters at the St. Louis Post-Dispatch were able to easily compare a voter registration database with a database of people who had died in the state to find dead voters. The resulting story was more accurate and easier to do.

As an example, the following stories won Pulitzer Prizes for their analysis of electronic data:

CAR is not just used for investigative projects. Beat reporters use CAR to analyze budgets, crime statistics, employment data and a host of other uses. Reporters look at OSHA inspection records every time there is a major industrial accident, FAA and NTS13 data every time there is an airplane crash, FRA railroad safety data every time there is a train accident and political contributions every time there is a big contract handed out. In short, WE LOVE all the data your agencies collect and we want to use it.

What we're looking for

There are a few things that a reporter generally needs to do this kind of analysis. Here are some things they are likely to be concerned about and why.

The data

A reporter will often not be sure where the information he or she is requesting is stored. Past experience has shown that FOIA officers never volunteer electronic databases.

"Sure Drew," said FOIA officer Wendy Johnson. "The data your requesting is in our National Information System database. Would you like that on a CD-ROM in a FoxPro formatted data file?"

Doesn't happen. Until FOIA officers become more familiar with their electronic holdings reporters will assume they have to do the research and then request the database by name.

At NICAR and the AP, I teach reporters to ask for the complete database and not just the portion they think they'll need. That's because databases can be very complicated and what may not seem important now may be critical to accurately use the data later. Also, while my story might be about SBA disaster loans in Tennessee, I need data from all 50 states to put Tennessee in proper context.

Sometimes agencies offer to do the analysis for a reporter. While we always appreciate a helpful federal employee, it is important for a reporter to do the work. It helps the reporter better understand the limitations of the data. Also, reporters often aren't sure what they're looking for until they find it. They don't want to go back time and time again to ask additional questions. Finally, often the best stories are found by accident.

Ideally, the database is free on the agency's web site. DOT, EPA and a number of agencies make commonly requested databases available. A good example is the Emergency Release Notification System for toxic releases maintained by the EPA at http://www.epa.gov/ERNS/docs/data.html (a model for distributing data on the web).

A good web site is one of the greatest blessings a government agency can give the public. It is practical because it will save you tons of time and money. It also makes our job easier. These are the qualities that the perfect government data web site would have.

  • Reporters could download the data by file transfer protocol (FTP). Individual states or other logical subsets of the data could be downloaded along with the whole country.
  • There would be copious documentation including the items listed in the following section
  • Searches that limited the number of fields would be kept to a minimum.
  • The data would be available in a common file format such as ASCII fixed-length or delimited
  • Problems or limitations of the data would be clearly spelled out
  • The site would be well maintained and fast
  • Often agencies put up searchable databases on the web rather than the complete database available for downloading. This format is often less usable for reporters because we may want to either combine the data with another database or we may want to search in ways not provided. While the OSHA site is wonderful for looking up the inspection records of an establishment, it is useless if you wanted to find all businesses that were sited for improper storage of radioactive materials.

    Data Format:

    If data is not available to download, it should be made available on a common magnetic media like a 9-track tape, 3480/3490 cart tape, CD-ROM, 8mm DAT, 4mm DAT or floppy. Most data sets can be exported from a database program as a fixed-width or delimited format - the lowest common denominator for data.

    Data documentation

    Databases are useless without proper documentation. A reporter is more interested in making sure they use the data correctly than you are. Generally, the following documentation is needed:

    Using FOIA

    The rest of this handout has been graciously provided by Jennifer LaFleur, database editor for the San Jose Mercury News:

    When I speak with journalism groups, these are the guidelines I typically give reporters, especially when dealing with electronic information. I also think these are useful guidelines for the folks at government agencies from whom journalists request information.

    1. Know the law. Know how FOIA treats the information you're requesting and what the exemptions are. If there are exemptions, the agency must be able to cite them.
    2. Know what information you want. Don't ask an agency to provide you everything they have. That can create a lot of unnecessary work. Make sure your request is specific. It's likely that a reporter may need the actual data that was used for a report. Before asking for the data, request the record layout for that databases. Documentation is available under FOIA.
    3. Know how the information is kept. Try to find someone in the information systems department at an agency who knows how they keep the information. This is not to avoid the public relations person you're dealing with, but sometimes it's necessary to talk nerd-to-nerd. Reporters who are using computer databases are actually quite computer savvy.
    4. Know what the appropriate cost should be. You really should only have to pay duplication costs. Ask for an itemization for those costs. This is an area that gets difficult when an agency has a set price for database.
    5. Know the source of the data -- be familiar with how the records are created.
    6. Know who administers the data. The person in charge of the database can be helpful for understanding the data.
    7. Get hardcopy summary reports. This will give you a way to check your data.
    8. Know how many records or pieces of information are in the database. When you get the database make Sure you have the right number of records.
    9. Know how large the database is. You don't have to know a lot of technical jargon. Most PCs will hold at least 250 megabytes of information or roughly 250 million characters. If they say the file is 900 megabytes, you won't be able to do much with it unless you find a computer with a bigger hard drive. Most newsrooms have the capability of reading nine-track tapes from mainframes and many other data formats. You may have to work out a way that is easiest for both of you. I frequently provide agencies with new tapes to avoid them having to charge me for the media.
    10. Confidential information. A government agency may claim that certain pieces of information are confidential. And there very well may be confidential information in the file, that doesn't mean You shouldn't get the rest of it.
    11. Get involved with local software users group meetings. These are groups that bring together folks from all different disciplines who work together to solve problems and share tips.

    I've talked with many reporters about their problem getting public records from both the federal government and state and local agencies. The problems are exceedingly more difficult for smaller news organizations who don't have the money or resources to go to court over records. Here are some of the excuses I or other reporters in my newsroom have heard in response to pubic records or electronic public records requests.

    All the previous guidelines are all dandy, until you actually go to ask for the data. Many times you'll I run into reasons why an agency can't give you information, probably because 8 out of 10 reporters will just go "oh, okay" and go away when confronted with these reasons. Here a sample of some of the actual reasons I have received for why an agency cannot fill my request:

     

    Listen Missy (yes, this really happened) our database is very complicated you probably won't understand it.

    Most newsrooms have the capacity to handle very large and complex databases. If you're not dealing with someone who knows the data, you need to -- or have a joint conversation with the public information officer and the database manager.

    Our computer system can't do that.

    This is unusual for any database systems. It may take talking with the software vendor to figure out the best option for both sides.

    The person that knows how to do that is on vacation for two weeks /doesn't work here anymore.

    This is a problem.

    It will cost you $20,000

    Ask for an itemized estimate of charges. Journalists understand that there might be some minor programming charges, but sometimes the charges seem unreasonable.

    Journalists also know that the list of other recipients of the data and what those recipients paid is available under F01A.

    The database is not public record.

    The burden is on the agency to show where in the law that information is not a public record. If the journalist has done his/her homework, this should be less of a problem.

    We don't like what you plan to do with it.

    Interesting. On the other hand when you're able to cooperate with an agency they frequently can help point you in the right direction. I recently did a story on landslides in the Santa Cruz Mountains and worked with the USGS who knew what the story was about and was helpful in letting me know what data were available and where to get it -- some was even available from their Web site.

    The database contains confidential information

    Everyone needs to do their homework on this one. Redacting with a computer is possible just like on paper with a black marker.

    We don't keep that on computer.

    Although this is true in some cases with smaller jurisdictions or small offices federal organizations, it's uncommon.

    If we give it to you, we'll have to give it to everyone.

    So what's the problem with this'?

    That database uses proprietary software

    We don't want software, we want the data. This may require getting together with the vendor.

    We don't mind giving you a few records, we just don't want to give you the whole database

    To which the judge asked: You mean if they wanted ONE record from ONE person it would be okay'? Their answer: depends on the person.

    The following is a list of costs charged by some agencies for their data.

     

    Cost Example #1: TEXAS

    The General Services Commission Charge Schedule

    The following is a summary of the charges for copies of public information as adopted by the commission. These charges took effect 9/18/96. All agencies covered by the Texas Public Information Act are to follow this schedule.

    For more information, contact Hadassah Schloss, the commission's open records administrator, at (512) 475-2497.

    (1) Standard paper copy -- 10 cents per page

    (2) Nonstandard-size copy

    (A) Diskette -- $1 each

    (B) Magnetic tape -- 4-mm, $13.50; 9-mm, $12; 9-track, $11

    (C) Data cartridge -2000 Series, $17.50; 3000 Series, $20; 6000 Series, $25; 9000 Series, $35; 600A, $20

    (D) Tape Cartridge -- 250 MB, $38; 525 MB, $45

    (E) VHS video cassette -- $2.50 each

    (F) Audio cassette -- $ 1 each

    (G) Oversized paper copy -- 50 cents each.

    (H) Mylar (36", 42" and 48") -- 3 mil., 85 cents per linear foot; 4 mil., $ 1. 10 per linear foot; 5 mil., $1.35 per linear foot

    (I) Blueline/blueprint paper 20 cents per linear foot

    (J) Other -- Actual cost

    (3) Personnel charge

    (A) Programming personnel $26 per hour

    (B) Other personnel -- $15 per hour

    (4) Overhead charge -- 20% of personnel charge

    (5) Microfiche or rnicrofilm charge

    (A) Paper copy -- 10 cents per page

    (B) Fiche or film copy -- Actual cost

    (6) Remote document retrieval charge -- Actual cost

    (7) Computer resource charge

    (A) Main frame -- $ 10 per minute

    (B) Midsize -- $1.50 per minute

    (C) Client/Server -- $2.20 per hour

    (D) PC or LAN -- $1 per hour

    (9) Miscellaneous supplies -- Actual cost

    (9) Postage and shipping charge -- Actual cost

    (10) Photographs -- Actual cost

    (11 ) Other costs -- Actual cost

    (12) Outsourced/contracted services -- Actual cost

    Updated: 2/26/97

     

    Example #2: VERMONT

    Secretary of State

    Vermont State Archives

    Official Fee Schedule for Copying Public Records

    THIS IS A WORLD WIDE WEB PUBLICATION OF THE VERMONT SECRETARY OF STATE

    (If you are reading a paper version of this document, you may find the original at www.sec.state.vt.us)

    TEXT OF ADOPTED RULE FILING -- Effective Date: December 19, 1996

    The following fees are the maximum that may be charged by state agencies for copying the public record where the statutes do not provide otherwise.

    These also are the maximum fees that may be charged by municipalities where the statutes do not provide otherwise and where the legislative body of the municipality has not set it own fee schedule.

    1. For staff time involved in physically duplicating a record, $.23 per minute after the first 30 minutes.

    2. For senior-level information technology specialists' time spent extracting data from databases or performing similar tasks necessary to comply with a request to create a new public record, $.41 per minute.

    3. For any other staff time for which cost can be charged and collected under this section, $.30 per minute.

    4. For photocopies, $.04 per single-sided page, $.06 per double-sided page for pages up to 8.5 by 14"

    5. For computer-generated paper copies, $.02 per page for pages up to 8.5 by 14 inches.

    6. For computer diskettes, $.88 each for 3.5-inch diskettes, $.69 each for 5.25-inch diskettes.

    7. For compact discs, $22.08 each for write-once CDs, $124.58 for re-writeable CDs.

    8. For audio tapes, $1.25 each.

    9. For video tapes, $2.25 each.

    For more information: