How will Google and the Panda Update Value My Website Knowledgebase?
Posted on 04. Aug, 2011
I had a great question come in from a valued reader, Jaap, this week: Regarding the Panda update: ”If I have a knowledge-base with over 50 articles or whitepapers from various sources, how will Google value this? Is it duplicate content? Will my rankings drop?” Thanks Jaap for the question.
A Little Panda Background
Google has been rolling out fairly regular updates that are collectively referred to as the Panda update. Originally, they called it the Farmer update to convey that it was meant to rid the index of content farms and link farms that add no real value to Google’s main customer; the searcher.
Now, it is referred to as Panda to be named after the brilliant individual working for Google that developed a scalable machine learning algorithm.
The goal of Panda and really Google as a search engine is to allow a website viewed as “high quality” to rise in the rankings while other lower quality sites sink lower in the rankings to pages that no one dares to visit if not banned from the index completely.
A common tactic of lower quality sites includes scraping higher quality sites for content to post on their site. Call it laziness or call it “gaming” the system, what happened was low quality sites with scraped content would rank higher over authoritative sites just to make a buck or two from their ads on their sites. The consequences of this filtered over to the ideal Google customer (the searcher) that would get annoyed with having to click through layers and layers of pages and navigation just to find what they were looking for.
Let me illustrate an example of search results pre-Panda and post-Panda. Last Fall, if you were suffering from Cyberchondria and tried to diagnose yourself online, you would have been inundated with content scraped from authoritative sources and presented to you on sites that were not so trustworthy. These sites were heavy with ads and the user experience was frustrating to say the least.
Now, that same search is more likely to return more reputable websites. It is more likely to return the sites that are the originator of that content. These sites may include a government site, a well known health care site, or another research based or authoritative website. So, now you can get a more accurate diagnosis online. In my personal experience, this was an improvement because I use to always end up with Lupus whenever Google-ing my symptoms. It’s possible to get a more accurate self diagnosis today. But, seriously, you should see a real doctor if you are having troubles.
The Panda update implemented a machine learning ability that is succeeding in evaluating the quality of a website based on more than just metrics that anyone can access and manipulate through “gaming” the system.
In doing this, Google put many on notice for duplicate content and even “thin” content problems on their websites. This term “duplicate content” is everywhere in relation to the Panda update but the update really goes so much deeper.
Many webmasters are working diligently to create original content on their site to avoid penalties. But now, even original content in not enough in light of this rather sophisticated machine learning algorithm update called Panda.
Now to the original question
How may Google value a knowledgebase of republished articles and whitepapers on your site? Will Google devalue your site?
I cannot say for certain since I do not work for Google and do not have inside information however, when studying what Google has to say on the matter and what well respected research based SEO sites like SEOmoz say I can come to the following conclusions. If you are compiling a knowledgebase of articles and whitepapers from various sources and not handling this correctly, Google’s machine learning abilities will de-value your site eventually. They will eventually treat it as duplicate content and the rankings of your website as a whole will drop.
How do you handle this correctly?
First, if an article or whitepaper has already been published on the originator’s website, from an ethical standpoint, you must give them proper credit. From a technical standpoint, the pages where those articles and whitepapers rest should have the “noindex” tag. This means, you tell Google to not crawl or index the page in question.
If you want to use this content to contribute to your audience’s user experience while still indexing a page, the best way to handle it would be to draft a completely unique and value added abstract of that article or whitepaper and show your user where to find the original which would ideally be on the originator’s website. Again, if it is on your website and you have permission to publish it, give proper credit but still add the “noindex” tag to the page.
For more answers on the factors that Google considers when determining the quality of a website please see: http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
Basically, focus on creating the highest quality site you can. If your site is to be a resource for others, that’s great but an aggregator of articles and whitepapers is simply not enough as the web and search practices evolve. It is important to consider the design and user experience along with the typical metrics. What is it that will best serve your audience on your site?
If your goal is to build authority and ranking in search for a particular topic, it is important to build content and create a user experience that is congruent with what an authority site would look like for that topic.
Please leave your comments and feedback below on this topic. If you have other questions you would like answers to, send me a quick email or comment below.
| Chris Darling of Darling SEO is the creator of the Be The Obvious Choice Online system that helps businesses not only get found online but to drive traffic and stand out from their competition while growing their customer base and business. For more information on how you can Be The Obvious Choice Online apply for a Discovery Session. |











