Do Search Engines Like Your Web Site?


Between 75% and 98.8% of visitors to Web sites come from searches made at search engines. If you're going to get high levels of traffic - and hence the levels of ROI you're looking for - it's very important that the search engines can access all the information on your Web site.

Do the search engines know about all of your pages?

You can find out which pages on your site the search engines know about by using a special search. If you search for 'site:' and your Web site address, the search engine will tell you all of the pages on your Web site it knows about.

For example, search for: site:webpositioningcentre.co.uk in Google. Yahoo or MSN Search, and it will tell you how many pages they know about.

If the search engines haven't found some of the pages on your Web site, it is probably because they are having trouble spidering them. ('Spidering' is when the search engine uses an automated robot to read your Web pages.)

Spiders work by starting off on a page which has been linked to by another Web site, or that has been submitted to the search engine. They then read and follow any links they find on the page, gradually working their way through your whole Web site.

At least, that's the theory.

The problem is, it's easy to confuse the spiders - especially as they are designed to be wary of following certain kinds of link.

Links which confuse spiders

If your links are within a large chunk of JavaScript code, the spider may not be able to find them, and will not be able to follow the links to your other pages.

This can happen if you have 'rollovers' as your navigation - for instance, pictures that change colour or appearance when you hover your mouse pointer over them. The JavaScript code that makes this happen can be convoluted enough for the spiders to ignore it rather than try to find links inside.

If you think your rollovers are blocking your site from being spidered, you will need to talk to your Web designers about changing the code in to a 'clean link' - a standard HTML link, with no extra code around it - that is much easier for the spiders to follow.

Links like these will look something like this:

Home Page

Page addresses to avoid

Spiders will also ignore pages if they don't like the URL (the address needed to find the page).

For example, a Web site that has URLs containing several variables can cause spiders to ignore the page content. You can spot pages like these as they have a ? in them, and &, for instance:

http://webpositioningcentre.co.uk/index.php?page=12&cat=23&jib=c

This URL has three variables, the parts with the = in them, between the ? and &s. We find that if a page has one variable, or even two, the top search engines will spider them without any problems. But if a URL has more than that, often the search engines will not spider them.

Spiders particularly avoid URLs that look like they have 'session IDs' in them. They look something like this:

http://webpositioningcentre.co.uk/index.php?page=12&id=29c8d7r2398jk27897a8

The set of numbers and letters do not make much sense to humans, but some Web sites use them to keep track of who you are, as you click through their Web site.

Spiders will generally avoid URLs with Session IDs in them, so if your Web site has them, you need to talk to the people who developed the site about re-writing it so they do not use these IDs, or at least that you can get around the Web site without them.

Clean links = happy spiders

If you use clean, easy to follow links without several variables in them, your Web site should be spidered without problem. There are, of course, many other facets to successful Search Engine Optimization, but if the search engines can't spider your content, your site will fall at the first hurdle.

Paul Silver and David Rosam are Head of Technical SEO and Head of SEO Copywriting at Web Positioning Centre (http://webpositioningcentre.co.uk). Paul has been involved with the Web commercially since 1996 and David has been writing marketing copy for 20 years, and writing for the Web for a decade.

home | site map
© 2022