Data on Web sites comes in many different forms. Sometimes it can be downloaded as a spreadsheet or database. Somtimes you'll have to do a little manipulation to get it into the format you need. Below are links to sites I have found useful. Here are the symbols for each file type listed below. Depending on the format of the data on the website, you'll have to handle each set differently.
| ASCII demilited or fixed-width file (easily imported into a database or spreadsheet) | HTML file (can be cut and pasted into spreadsheet or imported using Excel) | ||
| Spreadsheet file | Database file |
US Downloadable Databases - These are databases that can be downloaded in whole or in part and imported into a spreadsheet or database.
| US Census Data Lookup | The Census Bureau site allows you to download parts of the 1990 census and look at the latest Census 2000 summary data. The Census Bureau has a lot of other data files in their data access site. If you haven't worked with census data, make sure you read the instructions. You can also download Tiger data here for mapping. | |
| Government Information Sharing Project | This site has a number of commonly used databases that can be searched. Unfortunately, the search screens limit what you can do. It is easy to use and very well documented. Data includes the Agriculture Census, Consolidated Federal Funds data, Equal Employment Opportunity data, import/export data, school district data and economic census data. Most data can be saved and imported into a spreadsheet after a few queries. | |
| Federal Election Commission | The FEC periodically puts up data on their site. The site is slow and often the data can't be downloaded in mass. When it is, it is usually in a comma delimited format. | |
| FECInfo Election data | This election site is easy to use and has all the data contribution data though much of it is only searchable through a front-end. Lots of subscriber stuff as well. | |
| Health Care Finance Administration Data | Medicare and medicaid statistics. Includes MEDPAR data, HCFA public use file summaries and other neat tidbits. A lot of the data has been summarized making it a little less useful. Data is usually fixed-width stored as zipped files. | |
| Center for Disease Contol Data | The National Center for Health Statistics under the CDC has a lot of nice downloadable datasets on mortality and health in its data warehouse. You can download the complete 1998 ICD-9 and 2000 ICD-9 here as well (the coding manual for cause of death used by many state and federal agencies) along with a guide to the ICD-9. Data is in Lotus 1-2-3 and ASCII formats. | |
| The National Center for Health Statistics | Some specific health/disease related tables from the agency above in HTML format. | |
| Emergency Release Notification System (ERNS) | The EPA compiles ERNS data which is data on unexpected hazardous spills. The data can be downloaded by EPA region and is in a fixed-length ASCII file. Decent doumentation and data dictionaries exist as well. See below in searchable databases for the Toxic Release Inventory. Other EPA Databases are available here. | |
| |
Bureau of Justice Statistics | Tons of crime-related numbers from the gods of crime statistics. Also check their tables used in BJS reports for more data. |
| Office of Juvenile Justice Statistics | Juvenile justice crime statistics from the US Office of Juvenile Justice and Delinquency Prevention. Much of the data is in an annoying program you have to load on your machine. | |
| National Archive of Criminal Justice Data | National Archive of Criminal Justice Data which is affiliated with the Bureau of Justice Statistics. Don't try to use this data on deadline. Site generally sucks. | |
| IRS Statistics | The IRS Statistics of Income program tracks all sorts of data but always in summary form. You can find all sorts of information on non-profits (including the database of tax data for approved non-profits - downloadable in ASCII fixed-length form) and other stats on income earned, migration and foreign taxes paid. All are downloadble often in spreadsheet form. Some databases are VERY big and the site is VERY slow. | |
| Federal Railroad Administration Accident Data | Gone is the old gopher site but this web page still lets you download files in DBF format. Includes good record layouts. Very clean. Very nice. There are some nice summary tables that can be imported into spreadsheets and other goodies at their safety home page. | |
| National Oceanographic and Atmospheric Administration - Storm Prediction Center | This NOAA site includes a nice archive with downloadable files on tornadoes and tornado deaths since 1950 . There's also data on hail and wind damage data. Nice. | |
| USDA ERS State Fact Sheets | Basic ag data on import/exports, commidty prices, land use and production. Many sets are in Lotus spreadsheets, some maddeningly in PDF files. | |
| DOL - Mining accident data | Excellent site with useful data summarys as well as raw data that can be downloaded as easy to use, self-extracting files. You need to look around for the raw data but it's there. | |
| BTS National Data | Good transportation databases...some downloadable...some annoyingly limited with weak front-end search screens. | |
| National Highway Transportation Safety Admin | FTP site where you can download FARS (see below), vehicle recall, crash test and other safety databases. | |
| Fatal Accident Reporting System (FARS) | A real improvement over previous FAR systems. If you can get through the complicated search screens and refine your search, you can get exactly what you're looking for and download the results...a must for large, complicated databases. Also allows you to do cross-tabs. Data also available via FTP. | |
| FDIC Bank Deposit Summaries | Nicely engineered site that allows good flexibility to search and download some data but not all. Has call reports, summary of deposits and other banking statistics. | |
| HUD User | A compilation of a number of HUD databases including the American Housing Survey, Assisted Housing Survey and the State of the Nation's Cities database. Data is in SAS or SPSS format and fixed-length ASCII. Lots of nice searchable stuff on employment as well. | |
| US Forest Service Inventory | Not for the beginner. This is complex database that tracks the "extent, condition, volume, growth, and depletions of timber on the Nation's forest land." Think of it as the Sears Catalog for lumber companies. | |
| Federal Transit Administration | Lots of data on the relative efficiency and costs between major public transportation projects in the US. Stored as lots of Lotus 1-2-3 files. | |
| Fish and Wildlife Service Endangered Species | See the database download form to get a comma-separated file of all animals on the endangered species list. Also available are the plants list and delisted species. | |
| Administration on Aging | Lots of demographic data on older Americans in spreadsheet format. | |
| Veteran's Administration Veteran Data | Excel files with demographic data on Veterans. There's also a page of dozens of Excel files dealing with veteran medical programs. | |
| Social Security Admin. - Actuarial Office | Name distributions from social security applications. This will tell you what names have been most popular over the years. Okay...so it ain't investigative reporting. | |
| The Economic Statistics Briefing Room | Economic data that can be downloaded into spreadsheets | |
| The Social Statistics Briefing Room | Social stats like crime and population | |
| ATSDR HazDat Registry database | Search by clickable map for Superfund and other serious pollution sites. You can donload it into a spreadsheet very easily. | |
| Bureau of Labor Statistics | Extract and download employment, wage, layoff and other labor stats to spreadsheet or ASCII files. Very good. | |
| US Census Population Estimates | You can download updated population estimates by state, city, MSA or county. You have to do a little cutting and pasting but it works okay. | |
| Health and Human Services list of excluded doctors | On this site you can download a self extracting file that has a list of all doctors and organizations excluded from doing business with the government. | |
| Bureau of Transportation Statistics | On time statistics for airlines by airport and other variables. | |
| National Cancer Institute | Atlas of Cancer. Great site where you can download .dbf or ASCII files of the occurrence and geographical distribution of various type of cancer. The site has a good mapping tool as well. |
The Most Interesting Searchable Databases Online
International Data
Other Database-Related Resources
Other Sites with Lists of Searchable Databases
drew@drewsullivan.com
Document Update: June5, 2000