Thursday, January 21, 2010

SEO-Optimising URLs

Susan Moskwa, a Google Webmaster Trends Analyst wrote a good post on the Google Blog about optimising URLs for search engines.

All of us with websites want the search engines to find and index our website pages efficiently and quickly. Some SEO specialists feel that you don’t need to worry, Google etc will find everything in time but for me that is far too far laid back! It would seem that Google too would prefer we were a little more efficient so that they can be more efficient. Susan made the point that Google has a finite amount of resources and therefore efficient use of those resources is important for all of us on the web.

She wrote,

“URLs are like the bridges between your website and a search engine’s crawler: crawlers need to be able to find and cross those bridges (i.e., find and crawl your URLs) in order to get to your site’s content. If your URLs are complicated or redundant, crawlers are going to spend time tracing and retracing their steps; if your URLs are organized and lead directly to distinct content, crawlers can spend their time accessing your content rather than crawling through empty pages, or crawling the same content over and over via different URLs.”

Ideally each url should lead to one unique piece of content and each piece of content should only be accessible from one URL.

Her suggestions are:

1. Remove session ids and sortby from URLs.
URL parameters that don’t change the content of the page should be removed from the URL and put into a cookie and 301 redirected to a search engine friendly URL eg /folder/cookie-recipe.htm. This reduces the number of URLs pointing to the same content.
She says that:

If your CMS or current site setup makes this difficult, you can use the rel=canonical element to indicate the preferred URL for a particular piece of content.

2. Infinite spaces, eg calendars.
For example calendars can link to an infinite number of past or future dates each with its own URL. Search engine spiders then waste time trying to crawl all these pages.
Deal with this by;
Use your robots.txt file to disallow crawling of login pages, contact forms, shopping carts, and other pages which do not add to the value of your website in search engine results.

Ideally each url should lead to one unique piece of content and each piece of content should only be accessible from one URL.
If you can achieve this the easier your site will be for crawling and indexing.

PDF of The Google Presentation on Optimising