|
The internet has
truly revolutionized the communications industry. It is
mind-boggling how much is available at the touch of a keyboard. It
is fun to use a search engine and see the list of websites
available, websites that originate both in the U.S. and around the
world.
Have you ever
wondered how search engines rank one site over another? They use
something called web crawlers. It is a very descriptive term but it
is only partially accurate. While “web crawler” implies that a
little critter scurrying around your computer, it is really only an
electronic signal that asks a web server for a specific page. Then
the web crawler passes the web page data to the search engine’s
indexer.
In other words,
Web crawlers are programs that methodically browse the internet
looking for specific content. They create copies of visited pages.
Later, these pages are used by search engines, which index the pages
into a huge database so the information can be found quickly. A
query processor then uses the database to compare a search term to
the information in the database and returns with a list of the
websites that theoretically list the most likely matches to the
search term.
Web crawlers are
used sometimes to perform automated tasks, such as gathering email
addresses or other information; checking links or validating URL
codes.
A web crawler
generally starts by visiting the URLs in a list. It identifies all
that links within that URL and add them to its list. It’s easy to
see that a web crawler has a massive list of URLs and one web
crawler cannot possibly visit all the URLs that exist. Thus each
search engine has developed a method for its web crawlers to visit
those URLs in an efficient manner, so as to visit as many as
possible. Among the factors considered are how many links a URL
contains and how popular the site is among web users. There is also
a procedure that determines how often a web crawler visits a web
site to monitor changes to the website.
Of course, each
search engine has millions of web crawlers at work at any one time.
With all these web crawlers exploring and revisiting websites, it
was necessary for search engines to develop methods to avoid
overloading specific websites. This is called a politeness policy
and is intended to keep websites up and running despite the amount
of traffic it has. Some web crawlers are also programmed to gather
multiple types of data at one time.
If you do not
want a web site crawled (for example, if it has personal or private
information), it is possible to design a firewall through which the
web crawler won’t go.
While the actual
workings of web crawlers are often high technical and confusing to
casual web users, it can be fun to learn what the various search
engines name their web crawlers. For instance, Yahoo calls its web
crawler Slurp and Altavista calls its crawler, Scooter and Google’s
is called Googlebot.
If you have a
website, it is worthwhile to research each search engine to
determine how to increase the ranking of your website there. By
reading and following each site’s tips and guidelines, you can
improve the chances of a web crawler visiting your site and improve
how your site meets each search engine’s rating parameters.
While search
engines won’t divulge each of the parameters it uses (Google says it
uses more than 200 of them), each engine has pages, blogs and other
materials to help you improve your rankings. After all, search
engines are businesses too. By helping you improve your website,
they are also helping their business as well.
There are also
professional consultants and companies who also help to improve your
website’s rankings. While they will consider a variety of factors,
they will also help compare your site to each web crawler’s
operating parameters. Sometimes fixing a simple but technical
problem can help improve how a web crawler reacts in your site.
Isn’t it
incredible that electronic signals can be used to compile
information you need in a matter of seconds? Slurp, Scooter and
Googlebot are electronic friends that help each of us do our work
each day. And you don’t even have to feed them. |