Lesson 6

Boolean Search Logic

Boolean logic, simply put, is a way to both enlarge and narrow your search by allowing you to string terms together in a very precise way. It is especially helpful in allowing us to make our searches more unique and this search more effectively. Created by an English mathematician named George Booles, in its simplest form used most often on the Wb, Boolean search logic uses the four operators:

The terms are pretty self-explanatory.

AND allows you to require that more than one search term be present.

Question: Find all pages that contain both the terms drew and sullivan

Answer: drew AND sullivan

OR allows you to specify multiple terms of which one must be present.

Question: Find all the Web pages that are about print media

Answer: newspapers OR magazines OR periodicals

NOT allows you to search for some terms but without other terms

Question: Find all the pages about Sarajevo but not about the seige

Answer: Sarajevo NOT seige

NEAR allows you to find one term near another term

Question: Find all the pages about children working

Answer: child NEAR labor

Boolean search logic often allows you to search by strings of characters. The way to specify this is to put the phrase in quotation marks. Thus to find someone on the Web, you might search like:

Search: "Harry S. Truman"

Let's say we want to find out about me on the Web. Since there are billions of Web pages and hundreds of Drew Sullivans, you need to refine your search to find just the one Drew Sullivan we are interested in. You might start looking in the most general way and slowly narrow your search until you get a reasonable amount of results. Otherwise, you will be looking at Web pages for hours.

Using the above tools to narrow our search, here are our results from Altavista's Boolean search engine. From our searches we learn there is another Drew Sullivan, who lives in Canada, who does a lot of work with the Linux operating system. We don't want to include him, so to narrow our search we use the last search in the table.

search number of web pages found
drew OR sullivan 2,377,658
drew AND sullivan 42,535
drew NEAR sullivan 1,375
"drew sullivan" 498
"drew sullivan" AND NOT linux

273

We have narrowed our search down to 273 Web pages but I have a rule. Try to reduce the number of pages down to less than 100 or even less than 50 before you start looking at sites. That's because looking at sites can be a slow and laborious process depending on the speed of your connection. To do that, we need to narrow our search down more and the way to do this is by combining Boolean operators.

Combining Boolean Searches

The power of these terms can harnessed when you combine them together.

For instance, if we want to narrow our search by adding my occupation we can add the term "journalist" to the term "drew sullivan" with an AND operator to get:

"drew sullivan" and journalist

But on a Web site, I may be referred to as either a journalist or a reporter. We can add the OR operator and the word reporter but then we run into a problem.

"drew sullivan" and journalist or reporter

Look at the table below. We would have expected that by adding the term "OR reporter" we would have slightly increased the number of hits. Instead, the number jumps to over a half million sites. Why?

search number of web pages found
"drew sullivan" AND journalist

54

"drew sullivan" AND reporter 63
"drew sullivan" AND reporter OR journalist 588,375
"drew sullivan" AND (reporter OR journalist) 85

That's because you have to tell the search engine what to do first using paranthesis. It's like your grade school algebra you all want to forget. The equations below give different results depending on where the parenthesis are:

(2 x 3) +4 =10

2 x (3 +4) = 14

In the example above, we wanted the term "drew sullivan" and either of the terms (reporter or journalist). Thus the correct way to write the query is:

"drew sullivan" AND (reporter OR journalist)

We get 85 hits which is what we might expect because as we increased the scope of the query, we should get slighty more than the 54 hits we first got. Using the hit count is an important way to check your work and to make sure the search you are doing is the search you want to do.

Here might be another search you would do if you knew where I had worked:

"drew sullivan" AND (reporter or journalist) AND ("associated press" or AP or Tennessean)

pages with "drew sullivan" 298
pages with (reporter or journalist) 1,765,710
pages with ("associated press" or AP or Tennessean) 1,715,739
pages with all three 56

Many first time users of Boolean logic get confused as to when they need parenthesis. The answer is simple: whenever you are using both AND and OR. The operator AND effectively narrows a search and OR enlarges it. You have to tell the search engine what you want to do first -- enlarge or narrow the search.

Advanced Search Features

With all search engines you will notice that when you first go to their Web page, you will be given a search box such as at the Altavista site. If you type in Drew Sullivan, you will get more than 1 million pages. It makes the assumption that you want either term. Thus, always look for the "power search" or "advanced search" function on a search engine. This will be the place where you can use Boolean logic and other advanced features. The advanced search feature on Altavista is one of the most powerful tools on the web and we will use it in this exercise.

Captitalization

Notice that if you do the above search capitalizing the D and the S in Drew Sullivan you will get different results. Why? It's important to understand how these search engines work otherwise you may be eliminating Web pages you may want.

In Altavista as in some other search engines, whenever you use an upper case letter, it assumes you want your search to be case sensitive. If you use a lower-case letter, it will search for both upper and lower-case letters. As a rule of thumb, I would search always in lower case letters unless you have are looking for a proper noun that is also a common word (i.e. The Film Forum or Magnetic Corp.)

Wild Cards

Many search engines allow you to use wild cards to search for unknown characters or words. In Altavista, the wild card character is an asterisk (*). Thus, to look for the terms journalist, journals or journalism, you could use the term journal* to find all three.

Each search engine has its own rules. In Altavista, you must have at least the first three characters and wildcards can only stand in for zero to five characters. This helps in finding words that you might be unsure about their spelling. Thus the term salt*man will find both saltsman or saltzman or even saltman.

Be aware that Altavista has its own list of variations. Thus the term "America" and "Americans" may be found by the search term "American". Likewise, Altavista will find "Bill" if you enter "William." Be aware that this feature is turned off when you use wildcards.

Other Search Engines:

Besides Altavista, there are other powerful keyword search engines out there. Each does some things better than others and some have more documents than Altavista. Altavista has the most powerful advanced search features today (it could change) but may not have what you're looking for and using it all the time is like looking for your lost keys where the lamp is on. Therefore, good searchers become familiar with all good search engines. Here are some others.

Search Engine Watch is an online magazine about search engines and is a great resource for indepth information. They compile a comparison of search features between various keyword search engines as well as helpful search strategies and techniques.

Other sources

Here are some other sources of information on Boolean logic and search engines.

 

All material Copyright © 2000 Drew Sullivan unless otherwise indicated.
All Rights Reserved