WhatTheyClaimed.com – a lesson in crowdsourcing

Dafydd Vaughan on 19 June 2009

Yesterday, Richard Pope and I launched WhatTheyClaimed.com, a site aimed at digitising and collating all of the data from MPs expenses.

The website is based upon a system I built a few weeks ago to monitor our own expenses at Consumer Focus Labs . The site was designed to match the processes at Consumer Focus, but when Richard and I realised that MPs expenses were being published, we realised we could make use of the same codebase with a few minor changes.  I need to note at this point that the website was a personal project and not supported by Consumer Focus.

We rushed through these changes on Wednesday afternoon, sorted out some hosting and put the site up. Our initial plan was to get a few interested people to help us convert the data and start loading the information into the site bit by bit.

When the data was published yesterday morning, I realised how much of a mammoth task it was going to be. After a quick discussion, we decided to set up a generic username and password for the admin system so that lots of people could help convert the data.

Once MySociety posted the site on Twitter, everything went crazy. It very quickly became clear that the basic back end was a data entry nightmare and things needed to change. Some changes were quickly scoped out and implemented in between untimely distractions such as this and real work. We removed a number of fields from the forms including location since any location details had been completely redacted from the PDFs. I hacked some of the code to make things work and broke the site a few times in the process.

By 5.30 when I left the office to head home, I was absolutely shattered. I decided it was best to take a step back and take stock before making any further changes. I’d like to thank everyone for their messages of support through the day, and also everyone who sent in useful feedback. I haven’t been able to implement all of them yet, but hopefully a few more will be added over the next few days. I’d also like to thank the Guardian and Financial Times for mentioning the work we’ve done.

So, what lessons can be learned from this exercise?

Firstly and most importantly, plan! We threw the site together assuming the overall format of the data would be similar to the expenses claims made at Consumer Focus. In fact, it was much more complicated, and much more fragmented, with some (almost all!) crucial details missing. We also should have put much more effort into the data entry part of the site. While a basic system might be ok for internal use among a small number of people, it really isn’t appropriate for a crowd sourcing site- particularly when so many things can go wrong.

It is also worth considering that speed isn’t everything. We launched the site at 8.30am on Thursday. The Guardian waited until 3.30pm to launch their site and it was still successful. In hindsight we probably could have waited for a while and got things right first off.

Where next for WhatTheyClaimed.com?

I’m going to try and make some more changes over the next few days, in the hope of making things work a bit better. But in the long term, I don’t have the time to commit to the site. So if anyone is interested in taking over the project and giving it the attention it deserves (and needs!), then get in touch. I’ve already had messages from a few people interested in working on the site.