Nobody wants to see their work stolen but on the internet it’s easy to be a thief. Some of our clients have discovered that a website called FindHealthClinics is taking their content to make a quick buck.
Our clients are understandably upset.
They’ve worked hard to implement a digital presence that helps them connect with the people they can help, only to have it shoplifted by a site they’ve never heard of until now.
It seems like this kind of blatant pilfering shouldn’t be allowed to happen. But Google isn’t perfect. There are many sites that still try to manipulate the search engine to try and cash in on other people’s efforts.
FindHealthClinics is one of them.
What is a Content Scraper Site?
Web scraping means nothing more than pulling data from websites, and, in general, it has good practical uses. The Internet Archive is an excellent time capsule of the web; Facebook’s API helps many independent businesses become successful; and without web crawling SEO and search wouldn’t be possible. Of course, not all scraping is well-intentioned.
A specific kind of web scraping is content scraping: the harvesting of information from multiple websites to then repost said information, in this case FindHealthClinics. The thinking goes, if a content scraper can aggregate enough information then they’ll rank on Google and turn those rankings into cash.
Content scrapers have existed since the early days of search and will probably exist until the end days of search too.
Content being the cornerstone of any website, it sucks to see it so haphazardly lifted and reused by someone else. In an extreme case it seems content scraping could hurt your potential to draw in clients. Afterall, we know SEO requires an effective blog strategy to work and duplicate content is a big no-no for SEO.
How FindHealthClinics Works
FindHealthClinics is a local directory website.
It uses bots to scrape local health-related businesses such as therapists and chiropractors—and even funeral homes —and then creates a listing for that particular business on their site. If you go to the homepage you’ll get results that are local to you.
Here’s the listing for Boulder, Colorado. It’s a bit messy.
FindHealthClinics’ tagline is “This website allows you to find clinics all over the world and follow them on social media.”
But the particular listings include no links to social media channels. Instead, they’re filled with lifted content—primarily social media posts from Facebook and Instagram, along with some blogs—and a disgusting number of ads. The “Visit Site” button at the top of the page is an ad. Clicking a listing pops up an ad. In the bottom right corner of most pages there’s an automatic video ad.
On top of all the ads to make some revenue, FindHealthClinics wants you to “claim your Sponsored Listing,” which is a $19.99 monthly fee to cause your listing to show up at the top of the local businesses page.
Don’t bother.
Admittedly there is a URL to the particular listing’s site about halfway down the page. However, this link is a “nofollow” link, which means it’s not giving any SEO juice. It’s a tease.
Do Content Scrapers Hurt My SEO?
In 2013 Google’s John Mueller discussed content scraping during a Webmaster Hangout:
“So from our point of view, other sites copying your content wouldn’t be something that would negatively affect your website. So that’s a very common situation, that sites copy content.
“…if you’re not seeing those copies showing up in search for the queries that you care about then it might not be the highest priority to focus on.”
This makes sense. The duplicated content is going to index after your site is already indexed on Google. Google’s bots should see that the content is duplicated and rank the duplicated content accordingly.
We decided to test Mueller’s words by spinning FindHealthClinics’ listings through our keyword tool to see if our clients’ stolen content was ranking on FindHealthClinics.
Answer?
It’s not. On any particular therapist’s listing we found that FindHealthClinics ranks for 0 organic keywords.
The ripped content on FindHealthClinics is ranking for snippets of phrases and nonsense keywords. The kind of searches that don’t register on keyword research tools because their volume is so low as to be almost nonexistent.
And when we plugged in some of the keywords they are supposed to be ranking for on the first page of Google, we found that the keywords were too volatile to hold. Google’s latest core update seems to have affected directories, and it may be hurting a scraper-directory site like FindHealthClinics appropriately.
FindHealthClinic’s strategy isn’t to rank for the keywords that help therapists connect with clients. Their goal is twofold:
- To get unsuspecting potential clients to click on advertisements.
- To scare therapists and business owners into buying a subscription to boost their listing, thinking it will help them attract more clients.
Neither of these should be a concern for therapists and local healthcare providers. So long as their own sites are structured properly and maintained, FindHealthClinics will not outpace them on Google.
Still, you may want to take a proactive approach and there are options to have your content removed from a content scraper site like FindHealthClinics.
What Can You Do if a Scraper Site Gets Hold of Your Content?
Contact the Owner
Contact the owner of the website and let them know you want your content removed. This does work—sometimes: photographers constantly contact domain owners to have their pictures removed (which is why you should always make sure you have the right permissions to use a photo).
If that doesn’t work, contact the host site.
Contact the Host Site
To find out any website’s host go to ICANN and type in the domain and the information will appear below.
FindHealthClinics’ host is Cloudflare.
Send Cloudflare an email to notify them of the lifted content: registrar-abuse@cloudflare.com. Or you can send CloudFlare an email through their Contact Domain form.
(There is a Captcha involved, which prevents bots from scraping Cloudflare’s website.)
Report The Site to Google AdSense
If a website is using shady tactics and has an Adsense account they can be reported.
FindHealthClinics is using Adsense.
To report them go to Adsense’s Violation Report page and enter in the required information.
Use the Digital Millennium Copyright Act (DMCA)
The Digital Millennium Copyright Act is a 1998 law that criminalizes violations of content ownership.
To report a content scraper site go to Google’s DMCA dashboard and click on “create a new notice,” and then fill in the required information.
If enough therapists report FindHealthClinics after reading this article it may result in the site’s deindexing. But there’s a “but”.
But FindHealthClinics is based in the UK
In general DMCA takedowns may work. It can be a coin toss. Sometimes the domain host and web host are outside the United States or Europe (which has its own process) and are therefore not subject to DMCA.
FindHealthClinics is based in the UK, and the DMCA does not apply to the UK.
While the European Union’s GDPR data protection law states that businesses need the express permission of users to take their data, no matter where they live, the UK’s version of the GDPR (called the UK-GDPR) only applies to UK citizens. So the DMCA filings may be a dead end.
But it’s all going to be okay.
Content Scraping is Frustrating but Manageable
Yes, it’s very frustrating to find your hard earned content stolen and used by less-than-reputable websites but it’s important to remain calm. We found that therapists are highly unlikely to be hurt by content scrapers like FindHealthClinics.
Still, it’s important to remain vigilant.
The best way to beat scrapers is to focus on your website. Follow a sound strategy that solidifies your digital presence so that it never has to worry about a FindHealthClinics. And if you do, rather than worrying about content scrapers the famous Oscar Wilde quote will become true for you: “Imitation is the sincerest form of flattery that mediocrity can pay to greatness.”