Oct 1

I’ve started using Zend Framework for a project I’m under taking here at TradeDoubler. I’m building a new part of Searchware that is essentially standalone, so figured that this is a great oportunity to push for framework support. Better form validation, less scope for creating errors in trivial donkey work coding because it’s already done for you, and ultimately a better experience for the user.

The problem I soon discovered with ZF, is that the documentation is not as good as I would of hoped. Their introductory videos are absolutely amazing, but when it comes to getting a real project started, they leave you feeling a bit left out in the cold.

Multi Page Forms

A good example of this, and something I’ve just been working on, is multi page forms. Zend Form is a great start and will go places, but right now, I think it’s not quite there. I discovered that subforms are the recommended way to implement multi page forms, but the example in the documentation again doesn’t quite explain how to do it, it just points you in a direction and expects you to figure the rest out for yourself. All very good, but some of us are fairly busy and would rather just read a comprehensive example.

A comprehensive example :)

This is how I decided to make a multi page form based on Zend Form subforms. I don’t know if this is the best way of doing it, and I am a complete newbie to ZF, but since I couldn’t find any other examples, and this does work, I’ll just have to presume it is until I’m corrected by one of you kind readers :p. This example will show you how to setup the required classes, build a simple form, validate, and then store the information and make it available to subsequent forms for decision making.

Note: This is not a beginners guide to Zend Framework or MVC. If you’re not quite sure how ZF works, or what MVC is, please check out the introductory vids on the Zend site. They are very good.

So, to kick things off we’re going to need to load up all the classes required for our forms. To do this, add the following lines to your boot strap file.


DEFINE('APPLICATION_PATH','/data/web/yourApplication');

Zend_Loader::loadClass("Zend_Form");
Zend_Loader::loadClass("Zend_Session");
Zend_Loader::loadClass("Zend_Session_Namespace");

// And any validation classes you will be using, for example
Zend_Loader::loadClass("Zend_Validate_NotEmpty");

This will set up our bootstrap file with everything we need to build a form, so the next job is editing your controller class. Add the following methods into your controller. They are used to store and read validated form values, but more on that later.


	private function storeFormValues(Zend_Form $form)
	{
		$formSession = new Zend_Session_Namespace('yourAppForm');

		foreach ($form->getValues() as $key => $value)
		{
			$formSession->$key = $value;
		}
	}

	private function getFormValues()
	{
		$formSession = new Zend_Session_Namespace('yourAppForm');

		$data = array();
		foreach ($formSession->getIterator() as $key => $value)
		{
			$data[$key] = $value;
		}
		return $data;
	}

You will also need to add the following method in your controller class.


	 protected function getForm($formName)
	 {
	 	// you will need to edit this later, but leave it for now.
	 	require_once APPLICATION_PATH . '/forms/parentForm.php';

	 	$mainForm = new Form_ParentForm($this->getFormValues());

	 	if ($formName == 'main')
	 	{
	 		$form = $mainForm;
	 	}
	 	else
	 	{
	 		$form = $mainForm->getSubForm($formName);
	 		$form->addElement('hidden','currentFormStage',array('value' => $formName));
	 	}

	 	return $form;
	 }

So, I’ll take a little time to explain that one since it’s not instantly obvious.

First off

require_once APPLICATION_PATH . '/forms/parentForm.php';

is the path to your form classes. I’ll explain how to create those later but for now, decide where you will want to store your forms, and point this there. Remember the constant APPLICATION_PATH was set in the bootstrap file.

The next line is

$mainForm = new Form_AddAccount($this->getFormValues());

This instantiates our parent form and passes to it any form data we have in our session.

The next part is

if ($formName == 'main')
{
	$form = $mainForm;
}

This is used later on in the controller to check whether the entire form (i.e. all of it’s sub pages are validated). The controller asks for the sub form name, but if this is ‘main’, then the parent class is sent back.

Creating our forms
So far we’ve built the required scaffolding for our multi page form that will be used by the controller. The next step is to create the forms themselves. As I’ve already mentioned the overall multi page form consists of a parent container form, and a collection of sub forms. For the sake of making it easy to read, I’m going to use VERY crude examples of forms, but please consult the Zend Form docs for more details about creating various form elements. That part of things is fairly well documented.

The entire parent class looks like this…


class Form_ParentForm extends Zend_Form
{
	private $formValues;

	public function __construct($formValues)
	{
		$this->formValues = $formValues;
		parent::__construct();
	}

	public function getFormValue($name)
	{

		if (isset($this->formValues[$name]))
		{
			return $this->formValues[$name];
		}
		else
		{
			return null;
		}

	}

	public function init()
	{

		$this->setAction('index');
		$this->setMethod('post');

		require_once APPLICATION_PATH . '/forms/SubFormPageOne.php';
		require_once APPLICATION_PATH . '/forms/SubFormPageTwo.php';

		$pageOne = new Form_SubFormPageOne($this);
		$pageTwo = new Form_SubFormPageTwo($this);

		$this->addSubForm($pageOne,'pageOne');
		$this->addSubForm($pageTwo,'pageTwo');

	}

}

The only bit you need to be concerned about editing here is the init() method. Change setAction() and setMethod() as you see fit, but they will probably be ok as they are in most cases.
The next bit, the requires, is important. Remember in the controller class we edited the getForm() method. There was an include path in there that pointed to the parent form. You need to make sure that, obviously, this parent form is saved to the same place. You could put the subforms in other directories, but I don’t see any benefit of doing so, so I’d recommend you keep then all bundled together in the same directory.

Once you have included then, you instantiate the subforms (actually, they are instances of Zend_Form and not SubForm, but that is ok), and then pass them to addSubForm. Hopefully this is quite easy to follow so I’m not going to explain it any further. If you get stuck, please feel free to ask me a question.

So then, our final step in building the forms is to create the sub forms. The subform class looks like this.


class Form_SubFormPageOne extends Zend_Form
{
	private $parentForm;

	public function __construct(Zend_Form $parentForm)
	{
		$this->parentForm = $parentForm;
		parent::__construct();
	}

	public function init()
	{

		// engine dropdown
		$engineSelect = $this->createElement('select','engine');

		$engineSelect->addMultiOption('','Please Choose...');
		$engineSelect->addMultiOption('google','Google');
		$engineSelect->addMultiOption('yahoo','Yahoo');
		$engineSelect->addMultiOption('msn','MSN');

		$engineSelect->setRequired(true);

		$this->addElement($engineSelect);

		// create submit button
		$this->addElement('submit', 'btnNext', array( 'label' => 'Next')); 

	}

}

Apart from changing the class name to suit your needs, the only other thing you should need to edit is the init() method. In here you create the form elements, apply validation, decorators and so on. This work in exactly the same way as a single page form, so please consult one of the many Zend Form examples for details on adding elements. As you can see our example form simply gives a dropdown list of search engines and a submit button.

Plugging it all together
So we’ve got our scaffolding, and we’ve got our forms. The only thing left to do now is to stick it all together, and this happens in the ‘action’ method of the controller. In most cases, and certainly this one, it will be the index action.

This method is a bit longer than the others so rather than me blabbering on here, I’ll let the comments to the talking. :)


public function indexAction()
{
	$request = $this->getRequest();

	// is this a post back, i.e was the form submitted or is it a first visit.
	if ($request->isPost())
	{
		/*
		Get an instance of the current form.
		Remember currentFormStage was appended
		to the form as a hidden field in the getForm method.
		*/
		$form = $this->getForm($_POST['currentFormStage']);

		// does is pass validation?
		if ($form->isValid($_POST))
		{
			// yes, so save the values to our session.
			$this->storeFormValues($form);

			/*
			So, we've just check a subform and it was valid.
			Does this now make our entire form collection valid?
			Let's check by getting in instance of the parent form.
			*/
			if ($this->getForm('main')->isValid($this->getFormValues()))
			{
				/*
				The form is complete, so redirect to the
				finish action (you will need to create this)
				*/
				$this->_redirect("index/finish");
			}

			/*
			A crude but workable method of choosing which form to go to next.
			*/
			switch ($_POST['currentFormStage'])
			{
				case 'pageOne':
					$newForm = 'pageTwo';
					break;
				default:
					$newForm = 'pageOne';
				break;
			}
			/* get an instance of our new form.
			having passed page one, this would be now page two.
			*/
			$form = $this->getForm($newForm);

		}

	}
	else
	{
		/*
		If this is the first time the page is loaded
		i.e. no forms submitted, let's make sure the session is
		empty.
		*/
		$formSession = new Zend_Session_Namespace('yourAppForm');
		$formSession->unsetAll();

		// and then load the first form page.
		$form = $this->getForm('pageOne');
	}

	$this->view->printForm = $form;

}

And there it is, you’re done. You have a working multi page form in the Zend framework. One final note, the last line $this->view->printForm = $form; is simply to pass the form to the view. The view file for this controller/action, would contain printForm; ?>

I hope that helped clear things up, and if anybody has any questions, please feel free to post them.


Sep 9

I was out on the monthly schnitzel night last night, a periodical gathering of former IMW employees, where we drink beer, eat pork and talk about, amongst other things, the search industry.

JP Jones, former CTO at buy.at Leads, handed me his well used iPhone (he got his hands on it seemingly before Steve Jobs managed too!) on the screen was an email, a press release from Affiliate Future detailing a miraculous but somewhat secret technique of tracking users without cookies.

I find it hard to avoid a challenge, especially one implying other people in search thought of something before me, so I made the promise that I would have this figured out by the end of the following day. I have.

As we all know cookies are evil, so tracking users without them is a good thing, right? Well, not really. Not at all in fact, for a start cookies are not in the slightest bit evil. Yes, they track users, but when you actually think about it, that’s pretty essential. Anti spyware applications block cookies in the name of your ‘privacy’, but this is just utter nonsense they pedal in order to generate a faux “need” for their products. Ok, don’t take that as me saying spyware is not real. It is, and it’s bad, but cookies are not spyware, they are not a violation of your privacy and they do make the internet a much better place to work and play.

So, how is Affiliate Future’s unique and indeed patent pending (which by the way will NEVER stick) tracking system better? In short, it isn’t. It’s worse.

What AF have done is very clever, but it’s just as “intrusive” as a cookie. It still tracks the user across the internet in exactly the same fashion as the common garden cookie, but they do it by employing a devious, although admittedly clever hack. Unfortunately, it’s the same sort of hackory and bending of standards that real spyware writers employ.

Entity Tags, the new cookie?

Busy websites MUST employ some sort of caching system. They need a way to identify if a user has already downloaded a certain file, and then tell them to use that already downloaded version rather than use up bandwidth fetching the exact same content again. A header image or javascript file would be a perfect example of data you would want to be downloaded as little as possible. For very large websites this can save a fortune in bandwidth bills and server/admin requirements.

The “old” way of doing this was by issuing an expiry date for the content (file), and if that date had passed, then browser would request a new version. There are some problems with this method and so the powers that be came up with Entity Tags, or ETags.

Avoiding the essentially unimportant technical implementation, ETags are small chunks of text that uniquely identify a particular file, not by it’s creation date but by it’s content. Something like an MD5 hash would be employed to create a unique reference to the file content.

Upon the first visit, the users browser has no ETag for the file it’s requesting, and so the web server sends it the file, along with the ETag. The users browser then saves this ETag on the local computer, just like a cookie. The next time the user visit the webpage, the browser recognises that it has an ETag for that page, and so when requesting the page it says ‘here is my ETag, is that valid?’ The web server compares the unique identifier supplied by the users browser to the the current version of the file existing on the server. If the ETag matches, the server simply says ‘you already have that content, use your cached version’. If the ETag does not match then the server let’s the browser know and sends the new content, along with the new ETag.

Now, it is possible, as with all HTTP headers (which is what a cookie is) to manipulate (read and write) the data sent. So, instead of of sending a unique identifier for a file, Affiliate Future are sending a unique identifier for that particular user. JUST LIKE A COOKIE.

When the user revisits that site (or any other that includes that ‘trigger’ file) AF intercept the ETag, which instead of being used properly to optimise caching operations, now tells them who the user is.

While this is clever and I really do have to respect their outside of the box thinking here (bravo chaps!), it’s absolutely no better off for the privacy privy user, it still tracks them in just the same way, but anti spyware application users, and users with cookies turned off will be tracked, even though they blatantly don’t want to be. This is a bit of a middle finger to consumers who are, albeit naively, concerned about internet tracking.

Great news for affiliates then, right? They get more tracked sales, brilliant!

Perhaps - in the short term. ETag is not supported in anything but the newest of browsers and now this “technology” has been made public by AF, privacy advocates and anti spyware vendors everywhere will be very quick to jump into action and create ETag filtering plugins for browsers. This might be a route to slightly more tracked sales, but it is without doubt a temporary one.

Essentially what this hack has shown is that ETags can be abused, and if this means people start turning them off (if the option becomes available) then the bandwidth bill for large websites is going to rise, and they’re going to have to pass that cost along to us, the consumers.


Aug 20

I know a lot about click fraud, I won’t claim to be a “pioneer” of today’s scene (if that’s a suitable term), but once upon a time I certainly was breaking ground - amongst other things. Does that mean I agree with it? Well, no is the simple answer, but in some cases the answer is perhaps bit a bit more gray - but that’s a topic for another time.

What is click fraud?

Well, I somehow doubt your found this post without knowing, so I’ll keep this paragraph very brief and here purely for the benefit of those very few who don’t know. Click fraud is the act of clicking on Pay Per Click links without any intention of buying, or interest in, an advertisers product or service for personal gain. It’s as simple as that.

Why fraudulently click links?

There are three reasons people would want to do this. There may be other petty reasons but these are the important one’s.

1) To make themselves money. With scheme’s like Google’s Adsense around, clicking on your own links is a profitable venture.

2) To cost competitors money. Smaller businesses are going to be most affected by this since big search spenders would hardly notice your average click fraud campaign.

3) Tactics. Again, this only really works in the arena of smaller business, but if for example I wanted to make the most of my budget, I could reduce my CPC but targeting my competitors on Friday evening, depleting the budget and thus not having a any PPC competition over the weekend. This leaves me to bid essentially the minimum amount and get the top result.

How?

Clicking by hand doesn’t work. If you as a wannabe click fraudster sat clicking endlessly on an advert, you’ll achieve nothing. It’s quite trivial for Google and all the other engines to tell that the source of all these clicks is a single person and they will mark the clicks as fraudulent. If you’re doing this for reason number 1 (as stated above), then expect to loose your Adense account.

Ok, so YOU clicking by hand doesn’t work, but a farm of cheap labour in another country, all clicking from different locations, does. To a point, and very poor point at that.

Bot nets are a good choice for the potential fraudster. In this day and age where people are still silly enough to open random email attachments, and Microsoft can’t plug the holes in IE quick enough, there are more than a few viruses (or viri, or worms) floating around. Once upon a time a virus was a simple creature, who’s sole purpose in life was to damage peoples computers or data for the heavenly goal of entertaining its creator. Not that the creator ever saw any of the damage unless his little beasty got in the news. These days however, they lead far more sinister lives. A modern day virus doesn’t eat your files, or destroy your data, or do anything to give away it’s presence, it just sits on your computer quietly. Waiting for orders. People who control these bot nets have great power in click fraud terms. They have a bunch of real computers, on a diverse collection of IP addresses. Thousands of them, and they can make mincemeat of your budget.

Proxy servers. Not every aspiring click fraudster has access to a bot net. The very act of obtaining control over the computers in a net is illegal and at best if caught you would face a seriously large fine. That is if you have a talented lawyer. You’re probably going to jail otherwise. So, probably the most prolific way to “fake” a load of different click sources, by your average click fraudster at least, is to use proxy servers. These are servers littered around the internet that simply allow website requests to pass through them. If I were a person or program using a proxy server, the process would be like this…

I ask a proxy for google.com, the proxy gets the page, google log’s the proxy servers’ IP address and not mine, then the proxy gives the content back to me. I remain anonymous (in most cases), so using a list of proxies, all with differnent IP’s allows a person or program to keep clicking and clicking, and clicking.

If you have enough proxies this way is a feasible method for a fraudster to use, but the trouble is, or rather the blessing for us advertisers, is that Google aren’t stupid and are aware of most of the publically available proxies. Collecting a large enough list of private proxy servers is a difficult and time consuming process.

Method x. There is another way for a seasoned click fraudster with a little capital behind them to simulate a multitude of clicks. This was is so devastatingly undetectable from ‘real’ clicks that I am reluctant to disclose it, but rest assured, there is fifth method, and as far as I know, not one that is often (if ever anymore) employed. Be thankful.

Stopping click fraud

This biggest lie about click fraud as that the search engines (Google, Yahoo, MSN) don’t try to stop it because they make money from it. Every undetected fraudulent click in money in there pocket and out of the advertisers. I can understand why people would think this, but as evil as Google can be, this is utter rubbish. Google’s entire business model is based on Pay Per Click. That’s BILLIONS of dollars for providing this advertising platform. If thy for one moment neglect commitment to quality of service to the advertiser, they will crumble. It’s absolutely in Google’s interest to stop click fraud, so don’t believe they don’t try.

Click fraud comes in two flavours; that which you can prove, and that which you can’t. You can always detect all but the most subtle & gentle (ergo harmless) click attacks by the fact that your ROI drops, or plummets in some cases. ROI peaks and troughs, but if you are consistently spending more and earning less, then you are probably a victim of fraud (or a bad agency :) )

Your ROI dropping is not going to be good enough evidence for a refund however. Fair enough really, why should the engines believe you, and even with more compelling evidence, you’ll still be lucky. Nope, you’re on your own here, you will need to attack this problem yourself.

Firstly, you need to detect it. There are tools available (which I have no experience of) that claim to offer this service. I am skeptical.

Automated bot attacks can be detected because they leave patterns in your logs. You can see by digging through your site analytics that things are out of place. Traffic peaked when you don’t normally see it do so, or you suddenly got a 10% more people visiting with the same kind of browser. Things like that give away the presence of a click fraudster, and things like this mean the kind of products I’ve just mentioned CAN work, but what if we have a clued up fraudster on our hands?

What if this person has done the research on their target, what if they have devised a program that copies browser usage patterns and fakes them in an accurate balance across all the clicks, IE being x percent of the traffic and Firefox being y, in an accurate figure based on widely available stats.

Indeed, what if this person is aware of when your vertical see’s traffic peaks, Holiday searches in January, around lunchtime for example. What if they copy this pattern, and what if they slowly amplify it over a period.

What if this person has a reliable way of generating all this from REAL sources. Not bot nets, not proxies, not click farms.

There’s nothing that can detect this kind of fraudster. The only way you could perhaps tell it’s happening is by a drop in ROI, but you still won’t know where it’s coming from, or who is doing it. Thankfully, most click fraudsters aren’t capable of this, so we can combat it.

So, the conclusion we (or at I) have cme to, is that click fraud is unstoppable! Does that means we should surrender to it? Absolutely not. Do all you can to fight these useless clicks, they are wasting YOUR money, but ultimately you have to accept that it can and does happen.

Treat fraudulent activity as part of account management. As long as your ROI is on target, does it really matter beyong being frustrating that x percent of your traffic is fraudulent? Probably not. If however you’re below target and know that click fraud is a substantial part of the reason, then as an account manager you should absolutely invest your time in detecting and stopping it, after all your targets are at stake if you don’t.


Aug 20

There, I said it.

Drupal has a huge following and a very impressive community of contributors, it seems to be growing in popularity every week which is made apparent by the number of contracts available requiring Drupal experience. Surely it must be good in that case, right?

Because of this I decided I should make some effort to run after the bandwagon, waving and shouting in the hope they’ll let me hop on for the ride. I have been completely focused on back end PHP development for a couple of years now and I was beginning to feel left behind by the industry. Don’t get me wrong here, it’s not PHP that I’m left behind with -far from it in fact - I have probably been pushing PHP well ahead of the industry if anything. I’ve been Porting techniques from Java, and really pushing object oriented development in very enterprise level applications. I’ve been focused on real gritty backend systems development and the web 2.0 movement has got all fired up and started running around like a dog at dinner time while I’ve been sat in my dark room obsessing about making scalable & efficient budget strategy management systems for my employer, TradeDoubler.

Granted, it’s been exciting stuff to work with, but I’ve decided it’s time to hedge my bets on the direction of PHP. If I remain a solely backend developer then I’m putting absolute faith in the direction PHP as a language is going to take, and this strikes me as foolish, so once again I’m opening the dark room door and stepping in to the blinding light of web development.

So, what’s all this got to do with Drupal being shit? Well, Drupal was my first port of call on the journey to web 2.0 acceptance. I installed it on this very website, to run this very blog, but gave up in frustration. Sure, I got the blog working, but actually managing the content of a Drupal website via the admin area was tedious at best. It’s unintuitive, and if I’m going to be using a Content Management System, then I jolly well expect managing content to be the top priority when it comes to making things quick and easy.

Then there’s the code behind it all :| Maybe I’ve spent too long in my dark room surrounded by elegant design patterns and silky smooth OO, but I took one look at the Drupal source and was instantly reduced to a quivering wreck.

So, I installed Wordpress, which straight out of the box is much easier to actually get a working site up and running and much, much more usable and intuitive. Also, contrary to popular opinion, Wordpress is not just a piece of blogging software, it’s a fully fledged CMS.

But Drupal is much more powerful than Wordpress! I can here the echoes of Drupal fans already and I’m not even going to try and disagree with them, I shall take it for granted; If you’re building “complex” websites then Drupal is more more suited to the job than Wordpress. Fine, I’ll accept that without a fight, but I will say this; if you’re building “complex” websites, then a solid MVC framework is far, far more powerful, flexible and scalable than Drupal ever will be.

If you’re building a complicated website, then go with Symfony, because as soon as you want to do anything more than manage content, it is a much better choice for absolute control over your applcation, and if you just want to manage content on a fairly simple website, well, I’d choose Wordpress all day long… if only the community support was up there with the behemoth of the Drupal develop collective!