Content Scraping Websites, Bad For Your SEO?

As a professional blogger, online marketer and webmaster, I come across blogs daily that scrape the content of others (as well as periodically find scraper sites that scrape my own content). Now, by ‘scrape’ I mean that these website either; a) directly copy and paste content into their website manually or b) have an RSS scraping tool which automatically updates their site according to your RSS feed. In both of these situations that I see, I get incredibly frustrated and annoyed with the scraper site and its creator.
The reason I get frustrated is because I’ve manually created and built up my websites over the space of almost 6 years, and it has taken a lot of hard work and effort to get my websites in to the position they are now. Of course, it upsets me that a website can garner a solid PR from scraped content, but what bugs me the most is that a lot of webmasters have literally NO IDEA their content is elsewhere on the web.
Content scraping sites, bad for your SEO?
There are multiple forums and threads surrounding the content scraping topic, however there are two main answers and conclusions to the above question. The first one is this:
“If a site scrapes your content and leaves your internal links within the scraped content, you are getting backlinks for free. Who could complain about that?”
Well, I can for one. You see I do not write my content day in and day out putting on a sweat some of the time for somebody half way across the world to benefit from it and sell it on Flippa 3 months down the line. Why should I put money in the pocket of a content scraper when I have done all the work? Well, I shouldn’t. And you shouldn’t either. The next conclusion to the question of content scraping is this:
“Websites that scrape your website are infringing copyright in most cases and plagiarizing your work, in this case especially if they are deleting your contents internal links the website owner should be handed a notice of withdraw immediately”
I agree with the above. If a website is actively scraping your content and deleting the internal links from the original post it reflects badly on your SEO. Now, Google is pretty clever regarding who exactly posts content first, so shouldn’t penalize your website based on this, however it doesn’t do your website any good either. If something doesn’t do your website any good, don’t stand for it!
How to get scraped content removed
This can be an extremely annoying, piss-take of a task. I always follow the same process when I come across a content scraping website though, which is:
- First contact the website owner from their website
- If no reply, try to find them on social networking sites, contact them there
- If that fails, do a WHOIS look up to find out where their nameservers point, contact their host
- The host should direct you to a DMCA greivence form, which is free and you should fill it out
- The content in question is 100% removed from the web under the Digital Millennium Copyright Act, and you’ll be notified of it accordingly
Another method
Sometimes a scraper site will be hosted on its own servers, in which case the above process isn’t applicable to you. If placed in a situation where a website is self hosted, I either:
- Go in to Google webmaster tools and follow the steps in order to notify Google of a duplicate content website through DMCA (Google will / should de-list the content from their search results in around 5 days. I know this because they have done it for me)
Or
- Contact the scraper sites domain registrar to have their domain revoked. This method I have used once, for a network of sites selling my content and listed on Flippa for sale.
Final thoughts
The easiest way for a webmaster to find out whether they have been scraped is to -in WordPress- open up the comments page from their dashboard and filter the results by ‘pings’; in 90% of scraping circumstances, the scraping website owner is too thick to cancel their notify blog’s through ping function.
As of now, there are 5 websites that I know about actively content scraping Technology Blogged and once this post is published, there will be 5 versions of it on the web. Within the next 2 months, these websites will either be de-listed from Google, taken down through DMCA or have their domain revoked. Unfortunately however, there will always be new content scraping sites after these, and it is a never ending cycle. The truth of the matter is that you can’t really do anything to prevent it, however the best advice I can give is to insert HTML links within the bottom of your RSS posts; 90% of the time they’ll be pulled on to the scraper site and actively show that the content is Copyright 2012 _Your Blog.
Sponsored by: SEO Packages
PS: Did you like this post? If so, be sure to leave a comment below.



Great post Jakk,
I’ve had my content scraped a few times and had mixed results. In my experience, the author themselves rarely takes it down so you have to go over their head. Thanks, Julie
Thanks Julie
. Truly annoying isn’t it? It is the way of the world though I suppose.
An excellent article, Jakk. As an SEO, I can completely relate to your feelings on content scraping and do indeed also see it as a true menace. It saddens me that people profit from it, too.
Thank you for your input