Lesson 8
We now know how to use a catalog search engine like Yahoo and a keyword search engine like Altavista or Google using both Boolean logic and key word searches.
But which one should you use? Well, it depends on the job. You don't want to try to pound in a nail with a screwdriver.
So we need to determine a strategy to attack problems. As I said before, this is an art not a science. What we will discuss is a general search strategy that can be applied broadly to most of your searches. It includes a few tests you can use to make a good guess as to where to start. But remember, the Web is chaos and the best searching plan can yield nothing and a stupid plan can get lucky. However, luck is the residue of design and with a disciplined search strategy, you will get lucky much more often.
In lesson 3, we talked about a general approach. Let's revisit that approach.
- Knowing already where the information will be stored.
- Guessing correctly where the information is stored (don't laugh)
- Catalog search engines (see lesson 4)
- Key word search engines (see lesson 5 and lesson6)
- Specialty search engines
If you've answered no to the first two questions, you need to use a search engine. Now you must consider whether to use a catalog/directory search engine or a keyword search engine. At this point jump, most people will jump on Altavista, type "Bill Clinton", get 1.9 million hits, look at the first five and say "the Web sucks." I've done it myself. The better approach is to be logical.
Consider first the catalog/directory search engine. It's the easiest to use. Yahoo is quite helpful but often underused in serious searching.
Catalog search engines like Yahoo work well in these circumstances:
- When you're looking for a Web site
- When information is not unique (famous names, issues, organizations)
- When you're not sure what you're looking for.
- When you're looking for general or background information.
Key word search engines work best when
- When you're looking for a Web page
- When you're looking for unique information (i.e uncommon names, email addresses, titles of reports)
- When you are looking for something specific
Here's an analogy I used in lesson 3 but I'll mention again because it is important. Catalog search engines are like the table of contents in the front of a book. They generally lead to the front page of a Web site (like the front page of a chapter). If you're trying to find the Web site of an organization, person, company, agency or issue, catalog search engines work great. Remember that the people who build catalog search engines are more interested in the quality of a site, not the quantity of sites in their catalog. For mainstream sites that should exist, this is your best first stop.
Key word search engines are like the index in the back of a book. They lead to specific items on individual pages. If you're trying to find stories by a particular journalist, a report, the tallest mountain in Bosnia or the height of the empire state building, Key word search engines are more appropriate. Remember, key word searches are built automatically and are inclusive. They don't evaluate the merit of a site so all sorts of arcane stuff will send up there.
Think about your search item. Is it something you'd use the front of the book (table of contents) or the back of the book (index). Here are the concepts again:
catalog/directory table of contents front of the book web site key word index back of the book web page Here are some typical examples:
Problem Solution You want to find the Web site of Doctor's Without Frontiers Catalog You want to find a report and you know the title Key word If you don't know the report's title you know the agency that issued it Catalog If you want to learn anything you can about the Dinka tribe in Sudan Catalog If you want to know the typical dowry paid to a Dinka woman Key word If you want to know about Arkan and Cece Key word If you want to know weather around the world Catalog If you to know about Bill Clinton Catalog If you want to know about Bill Clinton's mom Key word Altavista does a good job in their help pages explaining when you would a catalog/directory and when you would use a key word search engine. They also made up a nice chart with some typical examples.
My final strategy is to figure out who would know. If you're trying to find out life expectancy in Slovenia, that sounds like a key word search. But it's a difficult search. Ask yourself the question you ask everyday as a reporter: who knows the answer to this question. What agency or organization cares about life expectancy? One agency that does this kind of work is the World Health Organization. Can I find the WHO site easily on a catalog search engine and then find the answer on their site? Even easier, can I guess their site address? Either www.who.int or www.who.org would have worked.
It's always easier to use a catalog search engine so if the query can be turned into a catalog search, you can quickly check a site for an answer. If you do use a key word search, restrict it to the web site in question. For example, the above search can be done:
host:www.who.int and "life expectancy"
You'll soon see the information is available in their World Health Report, which can be accessed from the Web. The life expectancy of Slovenia is 68. Bosnia is 65.
One last thing to keep in mind. No matter how many Web pages each search engine indexes, this is still a small percentage of the total pages out there. Yahoo covers less than 2% of the web pages (though those pages easily account for more than half the web visits) and Altavista and Google may only index 30% of the pages out there. There's a lot that can't be found through these tools.
Also keep in mind that there are many pages Altavista and Google can not index. Here are some of the types you will not find in a key word search engine:
- Dynamically generated pages. These are pages automatically generated by a query or something you input into a Web page. Any time you search an online database or enter information and a page is generated, that page is not searchable via key word search engines. See the lesson on specialty search engines.
- PDF files. Many government reports are stored in PDF files (which use the extension pdf). This file is more like an image of a page than an actual Web page.
- Words in graphical images. Many websites use graphical images of words rather than the actual words. This allows them to play with fonts or graphical effects. Also, the words in most news graphics on news sites are not searchable either.
- Web pages on password protected sites. The Web pages on most sites that require some password are not indexed.
- Pages recently created. It sometimes takes months for a Web site to be indexed. Most day to day changes are not reflected in the indexes of major search engines.
- Sites that have restricted access to search engines. There are ways you can keep a web crawler from visiting your site.
- Sites that just haven't yet been indexed. Sites that are not linked to are not likely to be indexed unless the owner submits the site to a search engines Web site.
Unfortunately, these are the Web pages that are growing in number fastest. The easiest way to manage a lot of information on a Web site is to use a database that will quickly generate Web pages on the fly based on the users needs. Coding static Web pages just doesn't make sense for organizations with a large Web presence. This will make finding information harder.
A side not:
Most people don't go to much trouble when they search. They follow no strategy and 95% of all searches are garbage. Need convincing: check out these voyeur sites that give you what people are searching for real-time.
All material Copyright © 2000 Drew Sullivan unless otherwise indicated.
All Rights Reserved