Post-Panda Duplicate Content Detection and Attribution: Google at Its Best

I have been looking at a client’s site that had some position losses after June 16 Panda rollout (v4 if I am not mistaken). They only lost positions for one tight group of keywords that one specific page was targeting while everything else remained stable. While checking different guesses, I looked at what I normally check in such cases, among other things – duplicate content suspects. For an exact match of a phrase off that page, I saw 8 results (1 omitted), out of which:

– #1 was a password protected page – so not indexable at the moment, even if it used to be indexable at some point;

– #2-5 were different tag pages off a scraper blog that has stolen the text off the original site  – mind you, the site’s hosting account has since been suspended and so the pages have long been removed from that site –  so no actual content there any more;

– #6 was a social bookmarking site that belongs to the breed of sites that save a copy of each bookmarked site rather than just the title, description and tags (a big pet peeve of mine – if your site’s platform does it why not at least make such saved copies non-indexable to save the site owners the dupe content pain?) – however, Google itself has identified this site as malware and displays a warning when you try to visit this page from their SERPs – one would think that should drop a site’s authority and trust, let alone ability to outrank the original source of the content;

– #7 was yet another social bookmarking site where some genius of a linkbuilder has built a link to a different site but stole the description off my client’s site  – the page was still up and the content on it was alive (this was the omitted result);

-#8 was the actual original page off my client’s site.

So we get Caffeine back in 2009 that is supposed to make indexing of the new and updated content much faster, then we get all the hype about a site’s loading speed and how important it is for rankings, then we get 4 incarnations of Panda that is supposed to better handle content attribution in cases of duplicate content, and then we get this shit in the SERPs? A page loses rankings because of 5 instances of duplicate content that even no longer exist and a malware site not even directly accessible from the SERPs, but – what a brilliant job on Google’s part! – the only actual instance of currently existing duplicate content gets correctly omitted, which still doesn’t help the original page get back its rankings. So much for all Google’s PR (as in, public relations, not pagerank)!

2 Comments.

  1. Panda in a nutshell. Puts scraper junk before the original quality content.

    Since you are an SEO I guess you have Matt Cutts link to turn in scraper sites and I hope you turned them in. Google needs some humans to fix Panda. My site has been ranking better ever since I turned in every page 1 scraper in my keyword space.

  2. Bummer to see such low quality results despite the broad notion that the SERPs have improved and that Google’s current ranking algo is doing what it is suppose to do. Google SERPs = failure in some areas.