A few months ago, we had an interesting issue with another website stealing our content in an unusual way. Essentially they ran a proxy service on a similar domain (a .com.ar instead of .com) and then replicated our site, replacing our ads with their own. They also ran our content through a content replacement algorithm, removing certain pieces of text and also replacing our own domain with theirs. It is fairly easy to do.

We had a few concerns about this. Firstly, we were worried that unsuspecting users would enter their login information on this site. Also, they actually were showing up in Google search engine results, which both hurts our brand, ranking and perception by users (this site loaded much slower than ours obviously).

First Attempt To Stop The Proxy Spammers: IP Blocking

Determined to put a stop to it, the first method we tried was blocking their crawler IPs. However, they owned enough IPs in enough ranges that it was simply a cat and mouse game. We would block them, and then a day later they would be back up. After a few weeks of this it didn’t seem to be a viable long-term solution for blocking these spammers.

Next, We Used Their Mirroring Of Our Content Against Them

Next I thought I would use the fact that they were modifying our page content and disable their site that way.

I created a box in html/css which would trigger via JavaScript only when loading from their domain. But, you wouldn’t think it would be that easy would you? They replace our domain with theirs, modifying any mention of their own domain. So I used a quick hash function to create a unique identifier from the loaded domain and then matched against that.

Also, figuring they would just take out the HTML box I created, I also found it useful to display a hex-encoded version of it. You can encode text to the hex-escaped version for Javascript with the following command line:

echo -n "some text" | od -A n -t x1 |sed 's/ /\\x/g'

The final Javascript I started loading on our site (and therefore their site) is below:

jQuery(document).ready(function() {
  // From http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/
  String.prototype.hashCode = function(){
	var hash = 0;
	if (this.length == 0) return hash;
	for (i = 0; i < this.length; i++) {
		char = this.charCodeAt(i);
		hash = ((hash<<5)-hash)+char;
		hash = hash & hash; // Convert to 32bit integer
	}
	return hash;
  }
  // sets hash to current domain
  domainhash = document.domain.hashCode();
  // lists domains they are loading the site from (calculate hash of the attacker domain first)
  if (domainhash == '-1289333690' || domainhash == '208666227') {
    var overlay_orig = jQuery("<div style="position: fixed;top: 0;left: 0;width: 100%;height: 100%;background-color: #000;filter:alpha(opacity=50);-moz-opacity:0.5;-khtml-opacity: 0.5;opacity: 0.5;z-index: 999;text-align:middle;"></div><div style="position: fixed; top: 0px; width: 100%; z-index: 10000;"><div style="z-index: 10000; width: 400px; padding: 30px; margin: 200px auto; background-color: white; border: 1px solid black;color: black;"><h1 style="color:red">Warning: This Is A Scam Site</h1><p>Sorry for the interruption, but the site you are currently visiting is not the real one. This site scrapes our content and injects their own ads to make money./div></div>");
    // Hex encoded version of above, used to defeat content replacement
    var overlay = jQuery("\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x70\x6f\x73\x69\x74\x69\x6f\x6e\x3a\x20\x66\x69\x78\x65\x64\x3b\x74\x6f\x70\x3a\x20\x30\x3b\x6c\x65\x66\x74\x3a\x20\x30\x3b\x77\x69\x64\x74\x68\x3a\x20\x31\x30\x30\x25\x3b\x68\x65\x69\x67\x68\x74\x3a\x20\x31\x30\x30\x25\x3b\x62\x61\x63\x6b\x67\x72\x6f\x75\x6e\x64\x2d\x63\x6f\x6c\x6f\x72\x3a\x20\x23\x30\x30\x30\x3b\x66\x69\x6c\x74\x65\x72\x3a\x61\x6c\x70\x68\x61\x28\x6f\x70\x61\x63\x69\x74\x79\x3d\x35\x30\x29\x3b\x2d\x6d\x6f\x7a\x2d\x6f\x70\x61\x63\x69\x74\x79\x3a\x30\x2e\x35\x3b\x2d\x6b\x68\x74\x6d\x6c\x2d\x6f\x70\x61\x63\x69\x74\x79\x3a\x20\x30\x2e\x35\x3b\x6f\x70\x61\x63\x69\x74\x79\x3a\x20\x30\x2e\x35\x3b\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x39\x39\x39\x3b\x74\x65\x78\x74\x2d\x61\x6c\x69\x67\x6e\x3a\x6d\x69\x64\x64\x6c\x65\x3b\x22\x3e\x3c\x2f\x64\x69\x76\x3e\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x70\x6f\x73\x69\x74\x69\x6f\x6e\x3a\x20\x66\x69\x78\x65\x64\x3b\x20\x74\x6f\x70\x3a\x20\x30\x70\x78\x3b\x20\x77\x69\x64\x74\x68\x3a\x20\x31\x30\x30\x25\x3b\x20\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x31\x30\x30\x30\x30\x3b\x22\x3e\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x31\x30\x30\x30\x30\x3b\x20\x77\x69\x64\x74\x68\x3a\x20\x34\x30\x30\x70\x78\x3b\x20\x70\x61\x64\x64\x69\x6e\x67\x3a\x20\x33\x30\x70\x78\x3b\x20\x6d\x61\x72\x67\x69\x6e\x3a\x20\x32\x30\x30\x70\x78\x20\x61\x75\x74\x6f\x3b\x20\x62\x61\x63\x6b\x67\x72\x6f\x75\x6e\x64\x2d\x63\x6f\x6c\x6f\x72\x3a\x20\x77\x68\x69\x74\x65\x3b\x20\x62\x6f\x72\x64\x65\x72\x3a\x20\x31\x70\x78\x20\x73\x6f\x6c\x69\x64\x20\x62\x6c\x61\x63\x6b\x3b\x63\x6f\x6c\x6f\x72\x3a\x20\x62\x6c\x61\x63\x6b\x3b\x22\x3e\x3c\x68\x31\x20\x73\x74\x79\x6c\x65\x3d\x22\x63\x6f\x6c\x6f\x72\x3a\x72\x65\x64\x22\x3e\x57\x61\x72\x6e\x69\x6e\x67\x3a\x20\x54\x68\x69\x73\x20\x49\x73\x20\x41\x20\x53\x63\x61\x6d\x20\x53\x69\x74\x65\x3c\x2f\x68\x31\x3e\x3c\x70\x3e\x53\x6f\x72\x72\x79\x20\x66\x6f\x72\x20\x74\x68\x65\x20\x69\x6e\x74\x65\x72\x72\x75\x70\x74\x69\x6f\x6e\x2c\x20\x62\x75\x74\x20\x74\x68\x65\x20\x73\x69\x74\x65\x20\x79\x6f\x75\x20\x61\x72\x65\x20\x63\x75\x72\x72\x65\x6e\x74\x6c\x79\x20\x76\x69\x73\x69\x74\x69\x6e\x67\x20\x69\x73\x20\x6e\x6f\x74\x20\x74\x68\x65\x20\x72\x65\x61\x6c\x20\x6f\x6e\x65\x2e\x20\x54\x68\x69\x73\x20\x73\x69\x74\x65\x20\x73\x63\x72\x61\x70\x65\x73\x20\x6f\x75\x72\x20\x63\x6f\x6e\x74\x65\x6e\x74\x20\x61\x6e\x64\x20\x69\x6e\x6a\x65\x63\x74\x73\x20\x74\x68\x65\x69\x72\x20\x6f\x77\x6e\x20\x61\x64\x73\x20\x74\x6f\x20\x6d\x61\x6b\x65\x20\x6d\x6f\x6e\x65\x79\x2e\x2f\x64\x69\x76\x3e\x3c\x2f\x64\x69\x76\x3e
");
    overlay.appendTo(document.body);
  }
});

After thoroughly testing and then putting up the code, we launched it live. It was very satisfying knowing that end-users were getting the warning message when visiting their site, and the actually reported their experience back to us. After about 12 hours, they figured out our play out and turned off their proxy. Success!

...

Not quite. The next morning I woke up to see their site back up. Looking into their code, they had removed the JS completely. I changed our code, embedded it in a JS file, and used other creative means to get it back up - but in the end they were just able to disable the Javascript entirely and defeat this attack. It was just another cat-and-mouse game.

We also tried DMCA/Abuse Contacts

We also tried to send notices to their hosting company to get them to take down the site. They were hosted in Argentina, so DMCA is not applicable. Their abuse contacts were non-responsive.

Originally we thought we could go after their domain registration since it violated our trademark - but this is a long and involved process involving lots of paperwork and time, and for another $10 they could just register another domain name. We didn't think this was a viable option.

Going to the Source: Google

Google is involved in this scam in a number of ways. First, they were indexing and serving his site in their search results. Secondly the scammers were replacing our ads with their own Google AdSense ads, which I am sure is a ToS violation.

While we were attempting to defeat their site from a technical perspective, we also began looking at Google to see what we could do from there.

Google has a process for submitting DMCA requests. The issue in this case is that they make you submit one batch at a time, and with millions of pages on our site and indexed in Google, it just doesn't make sense to list out urls line by line. It worked to remove those submitted results from search engine results, but it was like cutting grass with scissors. Finally, I attempted to contact Matt Cutts via Twitter:


Thanks. So much for that venue. I know @mattcutts is the public face of Web Spam at Google so I'm sure that he gets lots of @'s with dumb questions, but we were already way beyond this.

Finally, What Worked

Members of our site had a few personal contacts at Google both in an AdSense representative and otherwise. One of our contacts with Google was able to bring this issue up with the right people and they finally took the offending domain out of the SERPs permanently. We also reported their AdSense account - but we don't know what happened with that. Without them showing up in SERPs, it was a moot point because they won't be getting many visitors any more.

The Problem With Google

Google has become so large that it is almost impossible to get a situation like this taken care of without knowing someone who works for them. They have many automated systems in place, but scammers will continue to utilize loopholes for their own profit. Google enables this type of scam, yet they also profit from it.

I wish Google would have some sort of Ombudsman or review system set up so that someone like us, who is having our content ripped off by others using Google's own tools (and with Google taking a percentage of profit from these people), has a way to efficiently deal with them without resorting to personal contacts. We spent much time on this, time that could have been put to better use.

Or maybe personal contacts are the only real way to deal with a situation like this?

Anyway - I am welcoming comments and any other ideas for dealing with these Proxy Hijackers and how to keep them offline. I'm curious how widespread this type of incident is, we know of only one other site that was having the same issue from the same scammer.

After all, they can always get another domain for $10.

2 comments
  1. Thanks Christopher. I hadn’t really heard of anything like this before so I thought I would share in case anyone else has this problem. 

Comments are closed.

You May Also Like

Greasemonkey + Duggmirror Script = Awesome

Don’t you hate it when you try to visit a digg story…

How to Install SNMP on Tomato Router Firmware and Graph Traffic with Cacti

You’ve flashed your old WRT54G or other vanilla router with the Tomato…

Find Your Oldest Messages in Gmail

At some point in the past, Google removed the Oldest » link…

Presentation on Hybrid Stealthy Networks – Wireless Ad Hoc Networks

I presented this paper to my class on March 17th, 2009. Hybrid…