Web Matrix: Overview Matrix (Text)
Your Internet Service Shopping List
This is the real meat of the Matrix and where it earns its name. For each
server described on its own page, the charts below represent a checklist of
features and attributes in a unified interface. In addition, various
sections of the charts are linked to relevant descriptive or background
material located in this document or elsewhere in the collection.
For users with different browsers, there are both
graphic and
text-only views of this information.
Matrix Keys
In creating these checklists, I wanted to do more than indicate the
presence or lack of a feature with a simple check/no-check system. On the
other hand, I also wanted to avoid a rating scale from 1-10 or 1-100,
simply because I didn't want to judge whether one server's feature was 5%
better or worse than another's.
To this end I chose the following 4 element scale, which sorts the
evaluations logically (and graphically):
- '*' Bullet or Asterisk
- An excellent and rather complete implementation of the function. For
example, a system with boolean searching that supports complex queries.
- '+' Plus Sign
- Acceptable functionality, but lacking a robust implementation. Such a
system would let you select whether to join multiple keywords with
Boolean AND or Boolean OR, but nothing more.
- '-' Minus Sign
- A poor level of support for the function, typically because it is
implemented in terms of other functionality. An example of this would
be a system that simply decides whether multiple keywords should be
connected by Boolean AND or Boolean OR, regardless of user convenience.
- ' ' Blank
- The function is simply not supported on this server.
Just because one feature is rated with a plus while another is minus, it
doesn't represent an absolute judgment of quality; rather, I mean it to
indicate a rough comparison of the same feature across servers. If you
still feel that a mistake or injustice has been made, please
feel free to mail me.
Overview Matrix
-------------+-----------------------------------------------------------------
Catalog |Eval # of Web, FTP, Other Clarity Image Text HTML Non- US
or Index | Docs etc. etc. DB's Speed Dload Only Forms Forms
===============================================================================
AliWeb | 6.0k * - - + * * * +
-------------+----------------------------------------------------------------
CMU Lycos | 3.4m * * * + + * * * *
-------------+-----------------------------------------------------------------
W3 Catalog | 12.5k * - - + * * * * -
-------------+-----------------------------------------------------------------
EINet Galaxy | 100k * + * + + * * * *
-------------+-----------------------------------------------------------------
GNA MetaIndex| 3.0k * + + + * + *
-------------+-----------------------------------------------------------------
GNN Whole Int| 600. * + + * *
-------------+-----------------------------------------------------------------
InfoSeek | 400k * - * + + - * * *
-------------+-----------------------------------------------------------------
IPL | 300. * + + * + * *
-------------+-----------------------------------------------------------------
JumpStation | 275k * + - + * * + + -
-------------+-----------------------------------------------------------------
Subject Clear| 5.0k * * + + * * + *
-------------+-----------------------------------------------------------------
WebCrawler | 100k * + + + - * * * *
-------------+-----------------------------------------------------------------
WWW Worm | ? * - + + * * * *
-------------+-----------------------------------------------------------------
YAHOO | 55k * * * + * * * *
-------------+-----------------------------------------------------------------
Overview Criteria
- Catalog or Index Name:
The abbreviated name of the Web Index.
- Evaluation of the Index:
A comprehensive value that reflects the author's evalution of the
indicated index, on a scale of 1 Star to 5 Stars.
- Number of Documents:
The number of Web, Gopher, FTP, and other documents referenced by this
collection. Keep in mind that some subject catalogs keep a short list
of hand-picked rich resources, but the effectiveness of a search
engine is proportional to the size of its database.
- Contains Web and Gopher Links:
The index contains resources found on the Web, Gopher, or both.
- FTP, UseNet, ListServ, IRC:
The index contains pointers to FTP or other well-known types of
Internet resources.
- Other Databases:
The index contains information gathered from sources that are not
located on the Internet, such as MedLine, newspaper newswires,
or other commercial databases.
- Clarity of the Interface:
The layout of the search interface and other pages is easy to learn
and use. Well-designed services will offer navigation services across
the collection, poor services will be disorganized or littered with
obscure link icons.
- Speed of the Interface:
Relative speed of the server's links and download time for the images.
Typically reflects how many users connect to the index, the quality of
the search software, the speed of the server hardware, and the server's
support for load balancing.
- Image Download Time:
For each service with a logo or imagemap interface, how much it affect
the download time? A small logo is acceptable, but large images,
numerous icons, and textured backgrounds cost download speed and
bandwidth.
- Text-only Support:
Support for disabling images and text-only browsers means includes
functional navigational aids and alternative to the information in
images or icons.
- HTML Forms Support:
Reflects how well the server make efficient use of
HTML Forms in the search interface
and in feedback links.
- Non-Forms Support:
Certain browsers do not support HTML Forms, and must rely on simpler
search interfaces. If the server supports
non-forms searches, how robust and useful is
the search engine?
- Located in the United States:
The Web server or a mirror site is
located in the United States (not even Canada), indicating faster and
more reliable network access.
Features Matrix
----------------+---------------------------+----------------------------------
Catalog | Subject # of Depth of | Search. Mult. Boolean Proximity
or Index | Catalog Categ's Categ's | Index Keys Exp'ns Search
===============================================================================
1-AliWeb | | * *
----------------+---------------------------+----------------------------------
2-CMU Lycos | | * * - -
----------------+---------------------------+----------------------------------
3-W3 Catalog | | *
----------------+---------------------------+----------------------------------
4-EINet Galaxy | * * * | + * *
----------------+---------------------------+----------------------------------
5-GNA MetaIndex| | + + -
----------------+---------------------------+----------------------------------
6-GNN Whole Int| * * * |
----------------+---------------------------+----------------------------------
7-InfoSeek | | * * - *
----------------+---------------------------+----------------------------------
8-IPL | * + + | - + -
----------------+---------------------------+----------------------------------
9-JumpStation | | * + -
----------------+---------------------------+----------------------------------
10-Subject Clear| * * * | - + -
----------------+---------------------------+----------------------------------
11-WebCrawler | | * + -
----------------+---------------------------+----------------------------------
12-YAHOO | * * * | * + +
----------------+---------------------------+----------------------------------
(Continued)
--+----------------------------------------------------------------------------
| Key Reg. Substr Sorts Limits Rich Hit Custom Search in in
|Phrase Exp'ns Results Results Desc'ns S'ware in URL Summary text
===============================================================================
1| * + * * + + * *
--+----------------------------------------------------------------------------
2| - * * * * * * *
--+----------------------------------------------------------------------------
3| * * + * * -
--+----------------------------------------------------------------------------
4| * * * + * * *
--+----------------------------------------------------------------------------
5| - - * *
--+----------------------------------------------------------------------------
6|
--+----------------------------------------------------------------------------
7| * * * + * * * *
--+----------------------------------------------------------------------------
8| + - * *
--+----------------------------------------------------------------------------
9| + * *
--+----------------------------------------------------------------------------
10| - - *
--+----------------------------------------------------------------------------
11| + + * * * * *
--+----------------------------------------------------------------------------
13| * * * * * *
--+----------------------------------------------------------------------------
Features List
- Catalog or Index Name:
The abbreviated name of the Web Index.
- Subject Catalog:
The information in the index is organized by subject area, typically
in a hierarchic tree of information.
- Number of Categories:
The number of broad categories at the top-level of the catalog.
- Depth of Categories:
Average number of levels below each top-level category.
- Searchable Index:
The information in the index is stored in a database, which is
accessed by entering relevant search criteria, called
keywords, and then displayed
in a list of links to the desired documents.
- Multiple Keywords:
Users can expand or restrict databases searches by entering more
than one keyword. Additional controls are often necessary for
flexible control over the query.
- Boolean Searching:
For servers that allows Boolean
Searching, this field reflects the sophistication of the feature. Many
servers automatically join keywords with Boolean AND or Boolean OR, but
only EINet Galaxy supports complex criteria.
- Proximity Searching:
When examining a document against multiple keywords, a smart search
engine will place high relevance on a source where the keywords occur
close to each other or near the top of the file. As a rule of thumb,
this means that documents with incidental keyword matches will be
rated lower than others with highly relevant content.
- Keyword Phrase Searching:
In selecting highly related keywords, it may be desirable to treat them
as a single word phrase to encourage the search engine to find them
together. For example, using "Bill Gates" as a query will
generate better matches than the Boolean "Bill AND Gates".
Phrase searching provides much more specific functionality than either
Boolean searching or Proximity Searching.
- Regular Expressions:
A sophisticated method for specifying keyword patterns, using wildcard
characters and other matching functions; its generally available on
search engines that are based on Perl or grep software.
- Substring Searches:
This feature represents the ability to enter a complete or partial word
and generate matches containing it. Exceptional servers will examine
the keywords a user has entered, and identify the appropriate root word
to use as a substring search; I commonly refer to this functionality as
Root or Suffix Management.
- Sorts Search Results:
Many search engines will list the result sets in order of calculated
relevance, typically listing the best matches toword the top of the
results document and degenerating into poorer matches toward the bottom.
This feature makes its easy for users to identify and print only
the best 5 or 10 matches in a set.
- Limits Search Results:
Some servers allow the user to specify a maximum number of documents
to return, thus providing better response time and a focused result
set. Certain servers enforce a maximum number of search results, to
lessen the server load or to encourage user subscription.
- Richness of Match Descriptions:
This value reflects relevant background information for the documents
in the result set, such as match quality, file location, file size,
file timestamp, or extracted passages. The more information that a
service provides, the easier it is to identify useful documents
in the match set.
- Custom Search Software:
The software that performs a search is critical to the speed and
functionality of the service. Servers written using Perl, awk, or
other simple scripting tools are much slower than custom software
solutions written in C or using special database software.
- Searches Filenames and URLs:
Servers that can search the filenames and locations are useful for
locating documents in a particular location or by a particular
author. Unfortunately, such searches often interfere with keyword
searches because machine or user names may incidentally match
search criteria (such as www-genome.wi.mit.edu or
forestry.umn.edu).
- Searches Summaries and Keywords:
This is perhaps the most reliable type of search, because the
the search engine examines very specific words or phrases, rather
than the incidental text and file locations. Summaries are a good
way to assist search software in finding relevant documents, but
require administrative work and the establishment of consistent
descriptors and vocabulary
control standards.
- Searches Document Fulltext:
The most flexible type of search, this method applies a brute force
search the complete content of the documents for possible matches.
Although time-consuming and prone to error, fulltext searches can
be simplified or focused with tools such as
root management or
proximity searching.
This document was created at the University of Michigan
School of Information and Library
Studies (SILS), but it has been designed for public use. Permission
is hereby granted for unlimited print and electronic redistribution.
Your feedback is appreciated.
fprefect@umich.edu - 7/14/95