ScraperWiki: Hacks and Hackers day, Manchester.

If you’re not familiar with scraperwiki it’s “all the tools you need for Screen Scraping, Data Mining & visualisation”.

These guys are working really hard at convincing Journos that data is their friend by staging a steady stream of events bringing together journos and programmers together to see what happens.

So I landed at NWVM’s offices to what seems like a mountain of laptops, fried food, coke and biscuits to be one of the judges of their latest hacks and hackers day in Manchester (#hhhmcr). I was expecting some interesting stuff. I wasn’t dissapointed.

The winners

We had to pick three prizes from the six of so projects started that day and here’s what we (Tom Dobson, Julian Tait and me)  ended up with.

The three winners, in reverse order:

Quarternote: A website that would ‘scrape’ myspace for band information. The idea was that you could put a location and style of music in to the system and it would compile a line-up of bands.

A great idea (although more hacker than hack) and if I was a dragon I would consider investing. These guys also won the Scraperwiki ‘cup’ award for actually being brave enough to have a go at scraping data from Myspace. Apparently myspace content has less structure than custard! The collective gasps from the geeks in the room when they said that was what they wanted to do underlined that.

Second was Preston’s summer of spend.  Local councils are supposed to make details of any invoice over 500 pounds available, and many have. But many don’t make the data very useable.  Preston City council is no exception. PDF’s!

With a little help from Scraperwiki the data was scraped, tidied and put in a spreadsheet and then organised. It through up some fun stuff – 1000 pounds to The Bikini Beach Band! And some really interesting areas for exploration – like a single payment of over 80,000 to one person (why?) – and I’m sure we’ll see more from this as the data gets a good running through.  A really good example of how a journo and a hacker can work together.

The winner was one of number of projects that took the tweets from the GMP 24hr tweet experiment; what one group titled ‘Genetically modified police’ tweeting :). Enrico Zini and Yuwei Lin built a searchable GMP24 tweet database (and a great write up of the process) of the tweets which allowed searching by location, keyword, all kinds of things. It was a great use of the data and the working prototype was impressive given the time they had.

Credit should go to Michael Brunton-Spall of the Guardian into a useable dataset which saved a lot of work for those groups using the tweets as the raw data for their projects.

Other projects included mapping deprivation in manchester and a legal website that if it comes off will really be one to watch. All brilliant stuff.

Hacks and hackers we need you

Give the increasing amount of raw data that organisations are pumping out journalists will find themselves vital in making sure that they stay accountable. But I said in an earlier post that good journalists don’t need to know how to do everything, they just need to know who to ask.

The day proved to me and, I think to lots of people there,  that asking a hacker to help sort data out is really worth it.

I’m sure there will be more blogs etc about the day appearing over the next few days.

Thanks to everyone concerned for asking me along.