Monday, February 18, 2008

Who would cause the disabling of Google network partner sites

I would love to access to the data as to which blogs have been removed from Google's nework, how long they have been with Google and what exactly merited their removal. Certainly, such information would not be made available by Google, but wouldn't it be interesting to look into this data to find if there were similar root causes for a number of cases. I would love to see proof in the data that claims that perhaps someone else is causing accounts to be shut down. For example see this latest post on the subject and comments relating to it (http://www.websitebabble.com/adsense-other-ad-serving-programs/1477-banned-click-fraud.html).

Tuesday, February 12, 2008

How can we detect all types of click fraud?

It seems that there are several approaches out there to committing click fraud. The methods that are now out in the open, which means either academia discovered them first or they have been used for some time now are the following:
  • Using a click-ring
  • Using a single click bot possibly by proxy
  • Using a botnet or network of robots
  • Hiding clicks in Flash or Javascript

The first approach takes a number of humans each being paid peanuts to click on advertisements. One example public website which uses this as part of its business model is Click Monkeys which states specifically that they are not in the US and not subject to the laws of the US. Now if you are a business based in the US and abide the laws of this land, then I would hope that you would not contact a business to do your dirty work, which is not "restrained" by the laws of this land.

Next we have the click bot. This perhaps is a dying bread. I found that Clickzilla no longer has a functioning website. Of course click fraud was not their sales pitch, but there was no doubt this software could be used for such a purpose. There are several proxy lists that exist online not to mention Tor which allow a single machine to make requests through other machines on the internet. From what I understand Google has blacklisted a number of these proxies. Naively I thought that this approach could be noticed easily by checking variables such as referrer or useragent to find associations too high for normal traffic. An explanation of such a robot can be found at this link. A robot could randomly choose such attributes from a list and randomly choose periods of silence. However, how does one make sure to download the entire landing page (links and all) and enable javascript? Common robots do not support the interpretation of javascript (used by Google to follow notice characteristics of click fraud).

The next approach involves delving into the dark side of hacking. There exist large networks of zombies, or computers that have been compromised in one way or another for use by some central zombie authority. There are even botnets for rent out there. For a large sum of money one could take possession of say 100,000 machines and launch an attack undetectable by IP alone. Clickbot.a was an example of this type of attack, but the signals of this attack Google was able to decipher or they would not have released source code and a complete description.

Lastly, one can potentially use hidden frames or Flash advertisements to create clicks on advertisements from users who either visit a specific site cooperating with another on the hidden frame or who view an advertisement. In this way IP's and visitor locations are as dispersed and normal looking as the traffic is to the site or sites on which these cheats appear. The attributes of this traffic would in most cases be completely indistinguishable from normal traffic.

My question is with all of these known tricks and the yet to be discovered tricks that fraudsters use, how can one go about detecting all of these different types of fraud. Certainly there are many sources that would suggest that they are pretty good at detecting fraud, like Google or any 3rd party web traffic auditing company. However, I really wonder how much click fraud is getting through the best filters that exist out there? In reality I would like to see online advertising continue to flourish, since this leads to an ever increasing number of profitable websites paid for by advertisers instead of users. I would prefer to keep so many good services free of charge for use any time. What can we do?