Lesson 7
Boolean logic is vital but it isn't the only way to use a keyword search engine. There's another powerful tool in most keyword search engines that uses the structure of Web pages to help you more efficiently find information. This tool, called tag searching, allows you to search the tags used by Web designers in their web pages. It's the hidden trick of Web searching and, if used correctly, can save time on every search you do.
But first, we must briefly talk about what tags are and how Web pages are built.
You can view the structure of any Web page by clicking on the View button on the Internet Explorer menu bar at the top of the page and then clicking on the Source option. What you are seeing is called Hypertext Markup Language or HTML. This is the code Web designers use to tell your browser how to display information on a page such as what size font to use, whether to center text and where to put pictures or graphic elements. To display a particular size or position, HTML code uses tags that go on either side of the text. For instance, if you want to center text, the tag would look like this:
<center> Center this text </center>
The terms in brackets are the tags and they typically come in pairs: one before the text and one after. The end tag uses a slash as shown above to indicate it ends the centering function. These tags can be very far apart in a document.
I've added some explanation to the HTML source code of this page to explain some of the graphical elements below. I'm not going to go into much detail about HTML because others have done it better. I've given you links at the bottom of the page to some good primers on HTML on the Web.
There are only three tags you need to know for better searching:
- Title tag
- Image tag
- Link and anchor tag
Title tag: <title></title>
At the top of every browser page (in the blue bar) you will find the title of a Web page. For example, the title of this page is "Lesson 6 (that's the title)". Altavista, Google and other search engines allow you to search by a title. Why is this important? Well, because the title is the most succinct and concise description of a Web page. If you're looking for a word and it is in the title of the page, it is more likely to be the page your looking for. In fact, search engines often use the title tag (along with the frequency that a term appears on a page) to rank pages in order of relevancy.
For example: let's say we want to find Web pages related to the Bosnian pop star Dino Merlin. Merlin is probably mentioned in hundreds of pages but many of the pages we get back from our search will only have quick, casual references (i.e. My boyfriend looks like Dino Merlin). However, if we search for pages were Merlin is mentioned in the title, we're likely to find pages that are predominantly about Merlin. To do a title search in Altavista, we would enter the following in the advanced search window (the window we did our Boolean searches from):
title: Dino Merlin
Google also supports this feature from their advanced search window. You need to use one of the pull-down menus under the Occurrence option and select return results where my terms occur in the title of the page.
Image Tag
Altavista and other engines also allow you to search by the name of images in your page. To do this, Altavista uses the image tag which has the following format:
<img src="http://www.drewsullivan.com/bos/lessons/bosniamap.gif" width=167 height=161>
The tag indicates the path and file name of the image and the width and height the image is to be displayed.
Here's the result of using the above tag:
The image search function allows you to search by the file name of the image, in this case bosniamap.gif. To do this, you would enter the following command in the advanced search window of Altavista.
image: bosniamap
If we have used the searchs image:bosnia or image:map, we would have found the same file.
Keep in mind, you can only search the name of the image file. These names will likely be short because programmers like to keep file names short and because some computers only read eight or ten character file names. You need not specify the extension of the file name (i.e. gif, jpg, bmp, etc). Also, it is unlikely that any file name will be phrases. Thus if you are looking for an image of Slobodan Milosevic, you would NOT search by:
image:slobodan milosevic
but rather something more like:
image:milosevic
If there are a lot of image files named milosevic, you can add additional terms that might be on the Web page by using your Boolean operators. Thus, a more precise search might be:
image:milosevic and "slobodan milosevic"
In this search, Altavista is looking for an image named milosevic and for the phrase "slobodan milosevic" on the Web page where the image appears. If both are present, it's likely our man.
Link and Anchor
The most powerful of the HML tag searches use the information contained in a link. A link is the underlined text that, when clicked, takes you to another Web page. In HTML, a Web link looks like this:
<a href="http://www.drewsullivan.com/index.html"> Drew's Web Page</a>
On an actual Web page, the above code looks like this:
The tag is composed of two parts: the address and the anchor.
The address is the Web page you will go to if you click on the link (in this case, www.drewsullivan.com/index.html). The anchor is the underlined words that you will actually click to take you to that address (in this case Drew's Web Page). Note that when you move your pointer over the blue colored anchor above, you will see the address in the lower left corner of your Web browser.
The address itself is made up of two parts. The first part of the address (www.drewsullivan.com) is the domain name. Remember from lesson 1 that a domain name uniquely identifies a Web server on the Internet. As part of that domain name, you have the extension (.com) which identifies the type of Web site (in this case a commercial site).
Altavista allows you to search the above tag in four ways. Be aware that these names are a little confusing:
domain: allows you to search by the extension of the domain name
host: allows you to search by a particular domain name
url: allows you to search for any word in a Web address or URL
anchor: allows you to search by the words in an anchor
For example:
domain:ba searches by Bosnian domain names (i.e. www.bosnia.ba) host: www.un.org searches only the Web pages on the United Nations site (i.e. Web pages must start with www.un.org). url:bosnia searches for the term "bosnia" in any address (i.e. www.un.org/womenwatch/bosnia.htm) anchor: High Representative searches for any Web page with the words "High Representative" in a link A final search operator is the very handy link that allows you to find any Web page that links to a particular Web page. The format of link is the operator followed by the address. Thus,
link: www.un.org
will find any Web site that links to the UN's web site.
To summarize tag search operators:
title: search by title of page image: search by image file name domain: search by domain name extension host: search by domain name url: search by word in Web address anchor: search by words in an anchor link: find Web pages that link to a Web page So when do you use these?
The Domain operator is useful when you want to limit searches broadly. For instance, if you want to find an official U.S. government site, specifying domain:gov will limit your search to just those sites.
Host is always useful. If you want to find a single Web page on a particular site, you can limit your search to just the site in question. For instance, if you want to find what the U.S. Environmental Protection Agency role has been regarding the Anacostia River, then the following search will find all those references to the Anacostia River on the EPA's site.
host:www.epa.gov and anacostia river
The anchor operator is also very useful. Like a Web page title, the words that people choose for a link are often quite precise. If you want to find a list of hate groups, you'll get hundreds of hit by searching the term "list of hate groups". A quicker way is to find someone who links to a list of hate groups. What are they likely to label such a link? Probably something like "list of hate groups." To find such a page, try:
anchor: "list of hate groups" or anchor:"hate group links"
Anchor is also good for finding things like FAQ's (frequently asked questions) and organizations that are not regularly listed in Yahoo. For instance, if you wanted to find the Aryan Nation Web site, you might try:
anchor: aryan nation
The link operator is always useful. As reporters, we're always looking for both sides of a story. If you're doing a story on an organizations, you want to find people who agree with an organization and those that disagree. In the example above, people who bother to add a link to the Aryan Nation on their Web site either agree strongly with them or disagree strongly.
There are a few more operators Altavista uses but I haven't found them very useful. Google's advanced search does some of these same operations in a pull-down menu format. Unfortunately, Google doesn't allow the useful anchor operator.
If you want to read more about these tools, read Altavista's excellent Advanced Cheat Sheet for more details.
Now, as promised, here are more HTML guides than you'll ever have time for:
- Yahoo list of beginners HTML guides
- U. of Miami HTML Guide for Beginners (pretty good)
- Webcom beginners guide
- Index of HTML tags (kind of an HTML dictionary)
All material Copyright © 2000 Drew
Sullivan unless otherwise indicated.
All Rights Reserved