"A client asked me the other day about duplicate content and how it would related to the SEO of their site. Duplicate content refers to content on your website being found to be very similar to content on another page, either on or off the site, and then penalized for it by the search engines. The client was concerned about the percentage of duplicate content on their site, as it related to category pages using bits of product pages as their content. The funny thing is that most SEO's think in terms of only percentages when it comes to duplicate content. It actually doesn't work that way anymore.

Prior to 2011, the algorithms were much simpler in how they operated and scanned for duplicate content. At that time, you could avoid any duplicate content penalities just by making sure your content was 30% unique or more. This seemed to be the cut off at the time. However, after a number of algorithm changes by Google, this all changed.

I'll use made up phrases as an example.

On page 1 we have the following sentence: "The [sick] [dog] went to sleep."

On page 2 we have the following sentence: "The [ill] [canine] went to sleep."

I put terms in parenthesis that are similar. So, the old algorithms would consider both of these sentences to be 33% unique, which would have been fine, because 1/3 of their text was different from each other. The new algorithms would look at it a little differently. They would pass by both pages and note the percentage of difference and similarity in the content. They would then use their latent semantic index engine (essentially an intelligent synonym database), to catalog the terms in both sentences. Next, they match the terms against their existing database to determine if "sick" and "ill" are synonyms. After that, they use assigned "similarity scores" that influence how unique the content is perceived to be to help score the content overall. For instance, the terms "sick" and "ill" might be considered very similar to each other, whereas "sick" and "flu" are perceived as less similar to each other. These individual synonym scores influence the algorithm's perception of the "uniqueness" of the content. Even more interestingly, the semantic database changes daily, and so do the "similarity scores". Google might have learned 100 new synonyms for a word tomorrow, and this will change everything having to do with that word set in the database.

Google will also catalog the different parts of the site. The algorithm is intelligent, and it is very good at understanding the structure of your site. So, for instance, you don't have to worry about the content in your navigation bar being perceived as duplicate because Google knows that it is common practice for websites to have the same menu/navigation across multiple pages. Another example is that if you cite something from other websites or people, the citation will indicate to Google that the content is SUPPOSED TO BE duplicate content. You still won't want your whole page to be citations, but a little is ok. Your primary concern should be the uniqueness of the "body" of the page and making sure that you write it from scratch.

Google made these changes in order to combat marketers that were using small synonym databases to "spin" content in order to produce vast amounts of content without having to write it. This is great news because it means original content is worth even more than it used to be, because the "cheaters" can no longer easily cheat. The bad news though is that Duplicate content issues have become far more prevalent. Make sure you consulting your SEO professional so you don't have any duplicate content issues. Even now, this is one of the most widely ignored aspects of SEO."

About The Author:

Nick Fitzgerald - Vice President Of Online Marketing at Foremost Media, Inc
Nick handles the implementation of Search Engine Optimization, Social Media Management, Reputation Management, and Online Advertising Management for Foremost Media Customers. Nick is an expert in the fields of search engine optimization, online advertising, and reputation management.

2 comments: