Technology

The big three announce helpful new microformat to combat duplicate content

Anthony Marshall
Anthony Marshall
25 Feb 2009
blog post featured image

Duplicate Content has recently become a hot topic within the <acronym title="Search Engine Optimisation">SEO</acronym> community, with the 3 <dfn title="A program that searches internet resources for specified keywords and returns a list where the keywords were found">search engine</dfn> "giants" (<dfn title="A family of Internet-based services from Microsoft, which includes a search engine, e-mail (Hotmail), instant messaging (Windows Live Messaging)">MSN</dfn>, <dfn title="Yahoo - a widely used search engine for the web that finds information, news, images, products, finance">Yahoo!</dfn> and <dfn title="a widely used search engine that uses text-matching techniques to find web pages that are important and relevant to a user's search">Google</dfn>) proactively filtering out similar search results in a quest to present the user with more relevant and distinct web pages.

<em>Would you know if your site rank is suffering?</em>

Duplicate Content refers to substantial blocks of similar web page content, either site-wide or across domains. This is a commonly used technique in search engine spam - You probably see this regularly; redundant websites full of keywords and links in an attempt to deliberately trick <dfn title="A program that searches internet resources for specified keywords and returns a list where the keywords were found">search engines</dfn> into returning low quality results.

It's good that people like Google are trying to give us a better user experience with original and fresh content, but sometimes hard working webmasters can unknowingly spam the <dfn title="A program that searches internet resources for specified keywords and returns a list where the keywords were found">search engines</dfn>.

A common example is a dynamic E-commerce website which queries a database. The website may have multiple <acronym title="Uniform Resource Locator, the global address of documents and other resources on the World Wide Web.">URL</acronym>s which are in effect just different routes of accessing the same content: a product (which usually includes a copy & paste manufacturers description also found on any other site selling the same item), or the <acronym title="Uniform Resource Locator, the global address of documents and other resources on the World Wide Web.">URL</acronym> may contain a unique session id passed through the query string.

On February 12 2009, the 3 major <dfn title="A program that searches internet resources for specified keywords and returns a list where the keywords were found">search engines</dfn> introduced a new <a href="http://microformats.org/" title="microformat - a way of using existing and widely adopted standards to solve simple behavioural and usage problems" target="_blank">microformat</a> which lets the <dfn title="A program that searches internet resources for specified keywords and returns a list where the keywords were found">search engine</dfn> know which <acronym title="Uniform Resource Locator, the global address of documents and other resources on the World Wide Web.">URL</acronym> you think is the "canonical" or "proper" version. In effect you are telling the <dfn title="A program that searches internet resources for specified keywords and returns a list where the keywords were found">search engines</dfn> "this page is the most useful amongst those with duplicate content". They will then consolidate link popularity into that single <acronym title="Uniform Resource Locator, the global address of documents and other resources on the World Wide Web.">URL</acronym>

<pre>&lt;link rel="canonical" href="http://www.example.com/product/1234/" /&gt;</pre>

Although a step in the right direction for some types of site, the new <dfn title="microformat - a way of using existing and widely adopted standards to solve simple behavioural and usage problems">microformat</dfn> doesn't completely address the problem of duplicate content. And whilst duplicate content isn't grounds for <dfn title="A program that searches internet resources for specified keywords and returns a list where the keywords were found">search engines</dfn> to take action unless it appears to be manipulative and intented to decieve them, there's a <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=66359" title="opens in a new window" target="_blank">few steps</a> webmasters can take to <a href="http://www.webconfs.com/similar-page-checker.php" title="opens in a new window" target="_blank">ensure users are seeing their content</a>.</dfn>

Close chatbot
Open chatbot
Open chatbot