Is Panda Dragging You Down For Duplicate Pages You Are Not Aware Exist?

Ignorance of the Law Is No Excuse

By Andre van Wyk and foreword by Attorney Michael Ehline.

Panda Hangman

Did you get sentenced to death by Sheriff Panda for Duplicate Content and Not Even Know?

It is a maxim of English Common Law that “Ignorance of the Law is No Excuse”(Ignorantia juris non excusat) The rationale of the doctrine is that if ignorance could excuse you from a crime like a murder, for example, or a wrongful death lawsuit, if civil, then a person could allege like Schultz from Hogan’s Heros, “I know nothing”, and defeat liability.

So the law says your liable because you knew or should have known.

Google Panda is the Shire of the Reef

Sheriffs trace their history back to the Bible. The Office of Sheriff in America and the UK, comes from medieval England. People used to live in shires, or “domains” (sound familiar?). The people would elect protectors called the “reeve of the shire. ” Eventually, they were called “Sheriff.” Here, in our online shire, Google is the Sheriff, and Panda is its duly appointed deputy.

Were You Cited By the Panda Sheriff Deputy in Absentia?

WordPerfect and Blogger, as well as other platforms automatically index tags, categories and archives. Google assumes you know this, since you are presumed to know as a reasonably trained webmaster. Now some naysayers claim: “Google is smart enough to index one page”. They are right. Google is also smart enough to know that you know better than to send them a signal to index more than one page, post, or excerpt/snippet.

So Google will assume you are trying to game their results by attempting to index more than one document with the same or substantially identical content. Remember, Panda is site targeted, not page targeted like Penguin. One bad document, or apparent over-optimization will take your whole site down. So the Panda can simply act as a magistrate, assume you already had notice and opportunity and drag you down in absentia to the gallows, or to page 50. Worse, if you have links out to a friend, Penguin will accordingly drag the receiving sites down when it sees you have identical documents with different file extensions all linking back to your friend.

How to Deal With This and Get the Case Appealed and Overturned

Many sites have been hit with duplicate content penalties due to a simple issue that the site owners may not even be aware of. This issue comes about due to the fact that the search engines will index your archived content, tagged content and even categoried content as a duplication of the original post. Even an excerpt or a snippet on the category could be enough to trigger a penalty. It could be looked at as “aggressive SEO”. Technically it is a logical occurrence in the”eyes” of a search engine, as the bots cannot discern between the various categories, tags or date based archives as such, and will see the same content more than once and up to as many times as the content appears in an indexed tag, category or dated archive.

Scenario

And example would be – a post was made in January 2012 entitled “Car Accident Lawyer”, with a url of http://www.lawyerdomain.com/01/2012/car-accident-lawyer (as an example - because this would depend how the permalinks were setup). Once the post has been indexed and has aged, lets say 2 or three months down the line, the post will have been ‘reindexed’ under a different url, and specifically in this case under date or dated archives. As an example of the above you might have http://www.lawyerdomain.com/january-posts/car-accident-lawyer - this is very simplified, and again is just an example. Now the search engine will have two different urls with the same content on and hence duplicate content.

The same applies for category and tag based archives. An example is if you post some information and allocate 10 different tags to that post, the post may be indexed under the tag archive url an additional 10 times and hence there will be 11 identical articles on the site. Using the same example above, and the author posted with tags car-accidents, car-accident-lawyer, car-accident-injury etc etc – the various indexed urls can be:

  • http://www.lawyerdomain.com/tag/car-accidents/car-accident-lawyer
  • http://www.lawyerdomain.com/tag/car-accident-lawyer/car-accident-lawyer
  • http://www.lawyerdomain.com/tag/car-accident-injury/car-accident-lawyer

And yet all of that content will be identical and therefore duplicate.

So How Do we Deal With This?

There are a number of ways that this can be dealt with from settings on the blog, domain, or site to that of adding meta tags manually or via the Robots.txt file. It would also depend on the publishing platform that the individual is using. Here we will break down the options of WordPress, more specifically using WordPress SEO by Yoast (I will explain why this is my first choice in terms of SEO plugins), Blogger or Blogspot, and then the meta tags / robots.txt option. Note: Click on any image to enlarge.

WORDPRESS

WordPress (and most other publishing platforms for that matter) is geared to get you and your message out there on the web, unfortunately sometimes to our detriment, such as in the case of duplicated content, as described above. If you are using WordPress as a publishing platform for your site, or your blog I really recommend using WordPress SEO by Yoast (Plugin: http://wordpress.org/extend/plugins/wordpress-seo/Yoast’s site: http://yoast.com/), I would also recommend reading through his site or subscribing, as he is in fact extremely knowledgeable and helpful in many areas of WP SEO and related optimization issues.

Why I prefer his plugin is due to the various functionalities that it provides, and this includes easily tying in your Google Webmaster Tools account, providing Sitemaps, providing a link to your Google + business profile – as a publisher, amongst other great options, but most importantly due to what is detailed below:

The WordPress SEO plugin provides a user friendly (albeit tabbed) interface as per image 1b below, once installed you will have to go to the Titles & Metas as per image 1a (which will provide the options in 1b):

 

Image 1a

Image 1b

Very important in image 1b is that you select Sitewide Meta settings – “Noindex subpages of archives” this will prevent the indexed of archived items, such as where you see /page-1/, /page-2/, /page-3/ etc. This happens often when a site links back to your site for an article published there, and almost appears as a sitewide link – therefore everyone stands to get ‘punished’ by allowing the search engines to index these subpages.

The noodp and the noydir options will prevent search engines taking your description from Dmoz and Yahoo Dir respectively – based on the fact that the majority of attorneys and law firms do have the Dmoz and Ydir listings, it may be wise to select these options as they may contain outdated and low converting descriptions.

The next option, detailed by image 2, deals with the Taxonomies of the WordPress site in question. Basically these refer to your categories, tags and format if any.

Image 2

For all of these options you will want to select noindex, follow and save the settings.

Moving on to the “Other” section of the plugin options, image 3 provides what this area deals with:

Image 3

The same applies for the date and author archives, as per the Taxonomies. Select noindex, follow for both options and Save Changes.

Going back to the benefits of the WordPress SEO by Yoast is illustrated again in the functionality provided by the creation of XML sitemaps, which can be accessed as per image 4a, with the options in image 4b.

Image 4a

Image 4b

Here one has the option to exclude the inclusion of the taxonomies within the sitemap, which I personally recommend as it will also help in the noindexing of the taxonomies. One does not have to use this option for the sitemap generation but considering it is available under one ‘roof’ it is good practice, as well as the fact that the plugin creates separate posts and pages sitemaps, which can and should be submitted to your Google Webmasters Tools account for the domain. (Also remember to submit your feed as a sitemap to Webmaster tools for additional crawling and indexing).

Using WordPress SEO on posts and pages

The plugin also provides great functionality, and the option to override the sitewide settings on a per post or per page basis. This is ideal for publishing already published content for instance, and then selecting the noindex option on the advanced options of the plugin below the post or page content. Why would you want to do this? This is because you may be adding value to your site reader or visitor but the content may be a bit thin, or even duplicated and therefore you may not want this indexed for fear of being penalized. See image 5.

Image 5

As can be seen there are a number of options here including Canonical URL, 301 redirects, include or exclude from sitemap and so on. Really powerful and easy to use stuff!

BLOGGER

For those that are using blogger / blogspot, the option to edit these indexing options is contained under “Search Preferences” located under Settings which is accessible from your Blogger Dashboard, please see images 6 , 7 and 8.

 

Image 6

 

Image 7

Image 8

Especially important is the Archive and Search Pages – you will want to at the very least noindex those. There are a lot of other options available too, but be careful you do not block the search engines completely.

Manual, Meta Tags and Robots.txt

One may also use the manual way of preventing search engines from indexing the site, or even from accessing certain areas of the site. This too can become somewhat complicated or confusing to the novice, and please be careful you do not prevent the search engines completely from visiting or accessing your site in its entirety. (I have seen it that this has been done – and I have inadvertently done it myself.)

The robots meta tag, when used – most often within .html, .htm or even .php pages must be contained within the head section of the page – i.e. between the <head></head> tags. Image 9 provides the position and Image 10 some of the meta tag options that can be used.

Image 9

Image 10

There may be times when meta tags may clash or be duplicated on a page, as far as I know the most conservative of the meta tags will be taken as the command by the search engine.
Using robots.txt is considered as pretty outdated by many, and also forms the basis of many debates. Nevertheless it offers the option of restricting access to files, folders and pages too. I have provided some screenshots of examples of using the robots.txt file – please note that this file (being a .txt file) must be uploaded to the root of the domain so that it can be viewed in the browser when appending /robots.txt to the domain (eg. http://www.lawyerdomain.com/robots.txt). Please see Images 11 and 12 regarding the robots.txt options, where and what respectively. 

Image 11

Image 12

Credit for the above screen shots http://www.robotstxt.org/

These issues were discussed on the regular Friday Hangout at The Circle of Legal Trust on Google + - Circle us and join us every Friday morning!

Sources:
http://en.wikipedia.org/wiki/Ignorantia_juris_non_excusat

Posts by Michael Ehline

0saves

Speak Your Mind

*