How We Defeated a Proxy Jacker (Google Web Spam Syndrome)

A few months ago, we had an interesting issue with another website stealing our content in an unusual way. Essentially they ran a proxy service on a similar domain (a .com.ar instead of .com) and then replicated our site, replacing our ads with their own. They also ran our content through a content replacement algorithm, removing certain pieces of text and also replacing our own domain with theirs. It is fairly easy to do.

We had a few concerns about this. Firstly, we were worried that unsuspecting users would enter their login information on this site. Also, they actually were showing up in Google search engine results, which both hurts our brand, ranking and perception by users (this site loaded much slower than ours obviously).

First Attempt To Stop The Proxy Spammers: IP Blocking

Determined to put a stop to it, the first method we tried was blocking their crawler IPs. However, they owned enough IPs in enough ranges that it was simply a cat and mouse game. We would block them, and then a day later they would be back up. After a few weeks of this it didn’t seem to be a viable long-term solution for blocking these spammers.

Next, We Used Their Mirroring Of Our Content Against Them

Next I thought I would use the fact that they were modifying our page content and disable their site that way.

I created a box in html/css which would trigger via JavaScript only when loading from their domain. But, you wouldn’t think it would be that easy would you? They replace our domain with theirs, modifying any mention of their own domain. So I used a quick hash function to create a unique identifier from the loaded domain and then matched against that.

Also, figuring they would just take out the HTML box I created, I also found it useful to display a hex-encoded version of it. You can encode text to the hex-escaped version for Javascript with the following command line:

1
echo -n "some text" | od -A n -t x1 |sed 's/ /\\x/g'

The final Javascript I started loading on our site (and therefore their site) is below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<script type="text/javascript">
jQuery(document).ready(function() {
  // From http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/
  String.prototype.hashCode = function(){
    var hash = 0;
    if (this.length == 0) return hash;
    for (i = 0; i < this.length; i++) {
        char = this.charCodeAt(i);
        hash = ((hash<<5)-hash)+char;
        hash = hash & hash; // Convert to 32bit integer
    }
    return hash;
  }
  // sets hash to current domain
  domainhash = document.domain.hashCode();
  // lists domains they are loading the site from (calculate hash of the attacker domain first)
  if (domainhash == '-1289333690' || domainhash == '208666227') {
    var overlay_orig = jQuery('<div style="position: fixed;top: 0;left: 0;width: 100%;height: 100%;background-color: #000;filter:alpha(opacity=50);-moz-opacity:0.5;-khtml-opacity: 0.5;opacity: 0.5;z-index: 999;text-align:middle;"></div><div style="position: fixed; top: 0px; width: 100%; z-index: 10000;"><div style="z-index: 10000; width: 400px; padding: 30px; margin: 200px auto; background-color: white; border: 1px solid black;color: black;"><h1 style="color:red">Warning: This Is A Scam Site</h1><p>Sorry for the interruption, but the site you are currently visiting is not the real one. This site scrapes our content and injects their own ads to make money./div></div>');
    // Hex encoded version of above, used to defeat content replacement
    var overlay = jQuery("\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x70\x6f\x73\x69\x74\x69\x6f\x6e\x3a\x20\x66\x69\x78\x65\x64\x3b\x74\x6f\x70\x3a\x20\x30\x3b\x6c\x65\x66\x74\x3a\x20\x30\x3b\x77\x69\x64\x74\x68\x3a\x20\x31\x30\x30\x25\x3b\x68\x65\x69\x67\x68\x74\x3a\x20\x31\x30\x30\x25\x3b\x62\x61\x63\x6b\x67\x72\x6f\x75\x6e\x64\x2d\x63\x6f\x6c\x6f\x72\x3a\x20\x23\x30\x30\x30\x3b\x66\x69\x6c\x74\x65\x72\x3a\x61\x6c\x70\x68\x61\x28\x6f\x70\x61\x63\x69\x74\x79\x3d\x35\x30\x29\x3b\x2d\x6d\x6f\x7a\x2d\x6f\x70\x61\x63\x69\x74\x79\x3a\x30\x2e\x35\x3b\x2d\x6b\x68\x74\x6d\x6c\x2d\x6f\x70\x61\x63\x69\x74\x79\x3a\x20\x30\x2e\x35\x3b\x6f\x70\x61\x63\x69\x74\x79\x3a\x20\x30\x2e\x35\x3b\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x39\x39\x39\x3b\x74\x65\x78\x74\x2d\x61\x6c\x69\x67\x6e\x3a\x6d\x69\x64\x64\x6c\x65\x3b\x22\x3e\x3c\x2f\x64\x69\x76\x3e\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x70\x6f\x73\x69\x74\x69\x6f\x6e\x3a\x20\x66\x69\x78\x65\x64\x3b\x20\x74\x6f\x70\x3a\x20\x30\x70\x78\x3b\x20\x77\x69\x64\x74\x68\x3a\x20\x31\x30\x30\x25\x3b\x20\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x31\x30\x30\x30\x30\x3b\x22\x3e\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x31\x30\x30\x30\x30\x3b\x20\x77\x69\x64\x74\x68\x3a\x20\x34\x30\x30\x70\x78\x3b\x20\x70\x61\x64\x64\x69\x6e\x67\x3a\x20\x33\x30\x70\x78\x3b\x20\x6d\x61\x72\x67\x69\x6e\x3a\x20\x32\x30\x30\x70\x78\x20\x61\x75\x74\x6f\x3b\x20\x62\x61\x63\x6b\x67\x72\x6f\x75\x6e\x64\x2d\x63\x6f\x6c\x6f\x72\x3a\x20\x77\x68\x69\x74\x65\x3b\x20\x62\x6f\x72\x64\x65\x72\x3a\x20\x31\x70\x78\x20\x73\x6f\x6c\x69\x64\x20\x62\x6c\x61\x63\x6b\x3b\x63\x6f\x6c\x6f\x72\x3a\x20\x62\x6c\x61\x63\x6b\x3b\x22\x3e\x3c\x68\x31\x20\x73\x74\x79\x6c\x65\x3d\x22\x63\x6f\x6c\x6f\x72\x3a\x72\x65\x64\x22\x3e\x57\x61\x72\x6e\x69\x6e\x67\x3a\x20\x54\x68\x69\x73\x20\x49\x73\x20\x41\x20\x53\x63\x61\x6d\x20\x53\x69\x74\x65\x3c\x2f\x68\x31\x3e\x3c\x70\x3e\x53\x6f\x72\x72\x79\x20\x66\x6f\x72\x20\x74\x68\x65\x20\x69\x6e\x74\x65\x72\x72\x75\x70\x74\x69\x6f\x6e\x2c\x20\x62\x75\x74\x20\x74\x68\x65\x20\x73\x69\x74\x65\x20\x79\x6f\x75\x20\x61\x72\x65\x20\x63\x75\x72\x72\x65\x6e\x74\x6c\x79\x20\x76\x69\x73\x69\x74\x69\x6e\x67\x20\x69\x73\x20\x6e\x6f\x74\x20\x74\x68\x65\x20\x72\x65\x61\x6c\x20\x6f\x6e\x65\x2e\x20\x54\x68\x69\x73\x20\x73\x69\x74\x65\x20\x73\x63\x72\x61\x70\x65\x73\x20\x6f\x75\x72\x20\x63\x6f\x6e\x74\x65\x6e\x74\x20\x61\x6e\x64\x20\x69\x6e\x6a\x65\x63\x74\x73\x20\x74\x68\x65\x69\x72\x20\x6f\x77\x6e\x20\x61\x64\x73\x20\x74\x6f\x20\x6d\x61\x6b\x65\x20\x6d\x6f\x6e\x65\x79\x2e\x2f\x64\x69\x76\x3e\x3c\x2f\x64\x69\x76\x3e
"
);
    overlay.appendTo(document.body);
  }
});
</script>

After thoroughly testing and then putting up the code, we launched it live. It was very satisfying knowing that end-users were getting the warning message when visiting their site, and the actually reported their experience back to us. After about 12 hours, they figured out our play out and turned off their proxy. Success!

Not quite. The next morning I woke up to see their site back up. Looking into their code, they had removed the JS completely. I changed our code, embedded it in a JS file, and used other creative means to get it back up – but in the end they were just able to disable the Javascript entirely and defeat this attack. It was just another cat-and-mouse game.

We also tried DMCA/Abuse Contacts

We also tried to send notices to their hosting company to get them to take down the site. They were hosted in Argentina, so DMCA is not applicable. Their abuse contacts were non-responsive.

Originally we thought we could go after their domain registration since it violated our trademark – but this is a long and involved process involving lots of paperwork and time, and for another $10 they could just register another domain name. We didn’t think this was a viable option.

Going to the Source: Google

Google is involved in this scam in a number of ways. First, they were indexing and serving his site in their search results. Secondly the scammers were replacing our ads with their own Google AdSense ads, which I am sure is a ToS violation.

While we were attempting to defeat their site from a technical perspective, we also began looking at Google to see what we could do from there.

Google has a process for submitting DMCA requests. The issue in this case is that they make you submit one batch at a time, and with millions of pages on our site and indexed in Google, it just doesn’t make sense to list out urls line by line. It worked to remove those submitted results from search engine results, but it was like cutting grass with scissors. Finally, I attempted to contact Matt Cutts via Twitter:


Thanks. So much for that venue. I know @mattcutts is the public face of Web Spam at Google so I’m sure that he gets lots of @’s with dumb questions, but we were already way beyond this.

Finally, What Worked

Members of our site had a few personal contacts at Google both in an AdSense representative and otherwise. One of our contacts with Google was able to bring this issue up with the right people and they finally took the offending domain out of the SERPs permanently. We also reported their AdSense account – but we don’t know what happened with that. Without them showing up in SERPs, it was a moot point because they won’t be getting many visitors any more.

The Problem With Google

Google has become so large that it is almost impossible to get a situation like this taken care of without knowing someone who works for them. They have many automated systems in place, but scammers will continue to utilize loopholes for their own profit. Google enables this type of scam, yet they also profit from it.

I wish Google would have some sort of Ombudsman or review system set up so that someone like us, who is having our content ripped off by others using Google’s own tools (and with Google taking a percentage of profit from these people), has a way to efficiently deal with them without resorting to personal contacts. We spent much time on this, time that could have been put to better use.

Or maybe personal contacts are the only real way to deal with a situation like this?

Anyway – I am welcoming comments and any other ideas for dealing with these Proxy Hijackers and how to keep them offline. I’m curious how widespread this type of incident is, we know of only one other site that was having the same issue from the same scammer.

After all, they can always get another domain for $10.

Google Adds Two-Factor Authentication To Google Apps (For Real, This Time)

I’m not trying to say I had anything to do with Google adding two-factor authentication to Google Apps. I’m really not. But on September 9th, MakeUseOf published an article named How To Secure Your Google Apps Account with Two Factor Authentication. In this article, I wrote:

All of this brings up the question: why doesn’t Google enable a direct way to use two factor authentication with their Gmail, Calendar and other services? Many folks such as myself use Google services for all too many things in their lives, and that login is potentially the most important one of their online life. I would suggest that Google gets onto the security boat and enables this as an option for everyday folks.

Today, 11 days later, Google released their own Two-Factor authentication scheme for Google Apps account (Premier, Education and Government). An example of accurate prognostication? Or just dumb luck? Either way, great job Google!

If you are a Google Apps user, your Administrator will need to enable the feature for your account. Standard edition users will have this feature available shortly. Highly recommended for password and data security if you store your data in the Google cloud.

Find Your Oldest Messages in Gmail

At some point in the past, Google removed the Oldest » link out of everything except for your inbox and labels. This makes it hard to find the “first” or an early email of any sort if you have a lot in your search result. I wanted to find an email of which I had a lot of, to see what was the first one, of 10000s. Fortunately there is a workaround for this!

Updated 4/11/2010!

There is a simple URL you can visit to get to the last page of all of your messages. This will show you the first message you ever got in Gmail, and when you signed up for Gmail. Simply visit:

https://mail.google.com/mail/#search//p99999

This will return an error, then send you to the last page of all of your messages, inbox and archived.

Thanks to commenter Josh for this tip!

1. Perform your search.

To search all archived messages and not just the inbox, do a blank search.

2. At the top of the url, add /p9999 or other sufficiently large number to go beyond the last result.

Gmail will show you an error and not display any messages.

3. Click “Refresh” – you will end up at the last page of the search result.

If anyone has an idea how to do this in an easier, less complicated way, please let me know in comments! I would hope that Google will add a sorting function to Gmail, bringing the oldest message at top, but I guess there is not a lot of demand for that.

Tip from Google Help forum.

Official Google Chrome Themes Coming?

I just restarted Google Chrome after clearing out my cache, and found a link in my tabs box showing a themes tab:

chromethemes

I thought: Wait a second, Chrome has themes?! Hadn’t heard of that one. Excited, I clicked on the link. However, it takes you to a 404 file not found page.

https://tools.google.com/chrome/intl/themes/

It looks Official Chrome themes are coming shortly. I am running the dev channel version 3.0.195.4.

Edit: Looks like cnet has an article about Chrome theming!