How open is Data Journalism?

Where does data journalism get its data?

  1. In a brown envelope or mysterious memory stick
  2. From freedom of information requests
  3. From free (open) government data sources
  4. Some will collect their own via the crowd

It’s the brown envelope stories that invariably get the headlines (and the data-J love) and, although I think it gets the best bang for it’s money, there aren’t that many doing the collection from the crowd one that well. A look at majority of stories that ‘use’ data shows that it’s FOI that does the heavy lifting.  But when you look beyond the text at infographics, interactive and visual data journalism it’s open government data that provides the backbeat to most data driven content.

In the UK, the ONS are the mainstay of much of the casual data journalism you see out their. Data from the census for example, underpins a good deal of the comparative work of data journalism. The food standards agency hygiene ratings do a roaring trade in local newspapers.  What’s surprising is how little of the data, and the subsequent data developed from it is shared, let alone open.

The open government licence (OGL), which covers much of the open government data out there doesn’t require anything other than you:

acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence;

Data from FOI requests does not carry any copyright restrictions (unless the original data carries copyright) so there shouldn’t be too many barriers to sharing that directly – being more open and transparent.

Journalism is pretty good at attribution – we know where it’s data comes from,. But we very rarely get to handle the data. We see it, in charts and interactive pieces, but the days where news orgs happily gave us a link to a spreadsheet seem to be long gone (Data champions the Guardian’s list of data sets ends at 2013)  It would be great to see newsrooms, as users of open data,  be a little more open with the data they collect. But in the first instance it would be great to see newsroom simply acknowledge sources for data more fully and more accessibly  – working links, licence details etc.

Some examples:

Politicians claim £500,000 expenses for low-profile meetings abroad – This story from the Guardian is a Press Association piece is one of those neat stories where FOI  gets in the gaps and opens things up. But why not link to a spreadsheet of all the expenses?

UK has 2.3m children living in poverty, government says – This BBC story uses ONS data ‘helpfully’ linked in a PDF. They’s clearly released the data from the PDF so why not push that data out in a more open way.

Outrage as Bristol City Council credit card bill revealed: UGG boots, cinema trips and iTunes songs – Great council story from The Bristol Post. To their credit they even have a ‘text’ version of the data at the end of the article. But the original FOI data could have been linked in.

These are all great stories – this isn’t a comment on the quality of the journalism.

I’m increasingly hearing the refrain that data isn’t enough. We need the context, the stories that go with it. This and the tonne of other DJ content out there show how far we’ve come with this stuff.  But, the flip side is that if we have the stories then we can’t lose the data – the facts behind the story.  This is really important when that data is often second-hand. A number of stories I looked at had data from third-parties i.e. charities.  It’s not that the data is hokey but making it available would make it transparent.

I’m not sure that we are getting the best out of this data if it remains locked inside organisations – regardless of it being a government org or a private org. Taking a bit more of an open mindset would start to make the journalism even more usable.

Image by Kate Ter Haar via Flickr

Open data and rent-seeking economies

Yesterday, I was introduced to a concept I hadn’t heard of before – rent seeking behaviour.  Its an economics term that essentially means that instead of investing time developing things that you can sell, you spend all your money making it really hard for the other guy to sell their stuff. The best analogy I’ve seen for it so far uses pirates – don’t all the best analogies have pirates!

As someone who works in journalism it was nice to finally have a way to describe the approach of commercial news organisations to the BBC.  But I digress. I came across it in the context of open data. Chris Taggart Co-Founder & CEO at OpenCorporates used it, saying it was the kind of behaviour that making data open discouraged.

It makes sense. In an information economy we tend not to make stuff other than information and increasingly we build economies around protecting that information. It’s collected and combined (and I’m not discounting the value of that process) but it’s not ‘ours’.  As consumers we are becoming more aware through our own understanding or by others efforts to lift the stones, of the value of data. So anywhere there a low level of transparency there’s a risk of rent-seeking that directly impacts us.  You can see how open data would ‘bust’ that except that it relies on two things:

  1. it assumes that those that benefit from the rent-seeking in the first place would change their ways. Yes, the logic of diminishing profits is compelling but in a world where we trade at micro-second speed, who’s in it for the long game?
  2. it assumes that the open data ecosystem isn’t in danger of rent-seeking itself.

At the moment, much of the open data economy may not be rent-seeking but it does seem to do a lot of sub-letting *. It borrows data from others to make its own products. It often add a lot of understanding (and value) but often not very much new data.

Advocates of open data would perhaps point at the Government as the biggest rent-seeker in the data market. But open data is now as much a business as it is a movement for transparency and accountability. Instead if using lobbying and legislation of traditional rent-seeking, it’s licensing that seems to be the means of control.  So maybe I shouldn’t be surprised at the amount of ‘open-washing’ I see in the open data community but it would be a shame if the lobbying took over from the core business of making more data more open.

* I know, the kind of rent in rent-seeking is not the same as housing rent but

Image from Peter on Flicker 

#HLDJ Conference 6th and 7th November 2015


This isn’t meant to be a conference about Hyperlocal data journalism (but it could be!)

It’s a conference about the way that Hyperlocal AND/OR data AND/OR journalism might work together. I think that there are three distinct but related areas where there are new opportunities. I’ve got some basic definitions of the three areas below for you to agree or disagree with (have at ’em in the comments)

Each of the areas has it’s own distinct ideological and practical approaches.

  • they operate as an identity
  • they operate as a service
  • they all have a role to play in social, economic and political innovation

This would be a free conference in Preston and I’d like your help in what it should cover. My idea is to have one day of insight and examples (not an academic conference). But because I know that some people think there’s too much talking and not enough doing,  I’d like to have another day that’s more workshop based – practical how-to sessions. Time to think and time to do. Best of both worlds.

If you think you’d like to come then you’d really be helping me out by filling in the survey below.



The NESTA Here and Now report broadly defines hyperlocal as:

“Online news or content services pertaining to a town, village, single postcode or other small, geographically defined community.”

Open Data

The Open Foundation has produced a wildly accepted definition of Open Data:

“Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).”

Put most succinctly:

“Open data and content can be freely used, modified, and shared by anyone for any purpose


A definition of journalism that all can agree on is less straightforward. The  Wikipedia definition, calling on a well respected textbook for journalism says that:

“Journalism is gathering, processing, and dissemination of news and information related to the news to an audience. The word applies to both the method of inquiring for news and the literary style which is used to disseminate it.”



Pulling policing data into google sheets

Last week I had a regional newspaper group in for a few days exploring lots of different things. One area that popped up was data and we did a quick breakout looking at some data from an FOI on 101 calls.  It was a whirlwind overview – you can see the much more considered and in-depth take on it here.

The sessions are always a great motivator for me to explore and one of the things that came up was mapping areas.

It turns out that publishes KML files for all the police force boundaries and the neighborhood policing boundaries – great stuff.  That means you can import a KML file straight into Google Maps and get something like this:

Except that when I downloaded the boundary files from Lancashire police. Instead of files named after the areas, they had cryptic numbers like D2.kml.  Matching the numbers to the area name ended up being a slog through finding the area and then fishing the code out of the URL. Not great.

The code is in there somehwere
The code is in there somehwere

What you can do is query the api (the thing that drives part of

Try firing this into your browser :

Depending on your browser you’ll get something like:

The raw output of an API call to
The raw output of an API call to


Now you could try Alt+F and do a quick search….hmm

Importing data into a spreadsheet

You can see the problem. Unless you’re doing it all programatically, matching the codes to the areas so you can get some simple mapping or analysis done is a bit of a pain. So I decided to explore if I could use the api to try and pull it all together into a spreadsheet as a kind of look-up table:

The spreadsheet contains links to api calls to pull in boundary data as well as other stuff. It uses a script called importJSON which means you can query the data from the police api directly and have it appear nicely in a spreadsheet. More on that in this article, but I’d recommend a play with it because once you have the script in, and now we know stuff like the neighbourhood ID,  we can query the api directly we can pull all sorts of data into a spreadsheet

This example pulls in street level crime based on location

So we can start using the api in a semi automated way to quickly pull in data to process.

Let me know what you think.

Icons of the dead: visualising the deaths of migrant workers in Qatar.

Update: The data from the WaPo story was challenged by the Qatar Government resulting in an amendment stating that the data was deaths for all migrants not just those working on world cup venues. Something, in a uncharacteristic bit of comment baiting,  Roy Greenslade felt the need to point out.

The Washington Post is getting a lot of (social media) interest with a visualisation of the The human toll of FIFA’s corruption. It links the recent FIFA arrests and alleged bribery in the bid for Qatar to get the word cup to the miserable plight of foreign workers building the stadiums for the 2022 event. It’s a compelling graphic and one that makes effective use of scrolling.

It’s not a new idea – it calls to mind a visualisation from last year by, recently closed,  data journalism site Ampp3d:


Breaking down the numbers of dead against the teams and then giving you the remainder is a powerful visual storytelling device, as many commented at the time, especially in mobile where the smaller screen meant more scrolling, emphasising the scale of the deaths. It was something Buzzfeed proved later that year with a visualisation of how many soldiers died in World War I.

The staggering number of deaths of migrant workers means that any visualisation is going to have and impact. The difference in scale in this chart from London Loves Business in 2014 (charting similar data to the WaPo piece),  gives you the same scrolling impact:


The sheer height of any comparative data has an interesting interplay with architecture, something that spontaneous architects Axel and Sylvain Stampa Macaux thought about. At first glance their visualisation exploits that construction metaphor.


But it’s actually a plan for an imagined piece of real architecture:

Compelling stuff.

All of this gives me pause for thought.

On the one hand it makes me think of innovation in journalism. Many have noted how innovative Ampp3d was in data journalism. A mainstream news org putting that form of storytelling out there, doing journalism with graphics and, more importantly, emotion, has given others permission to try.  I’m not saying the Wapo copied ampp3d.  The journalism industry is one that thrives on monitoring and imitation for innovation – look at the use of listicles and quizzes.  It’s interesting to see how this helps make new forms of storytelling acceptable and how it filters through.

UPDATE: As this video from Channel Four suggests

On the other, and more fundamentally, it reminds me about the challenges of making data from the human experience.

The Wapo piece opens with the line “In the end, it only took a $150 million scandal to make Americans care about soccer.”  That’s care about football, not to care about the death of 1000’s of migrant workers; something that’s been reported on for a few years now.

How do we keep the idea that this is a human being not a data point?

I’m not having a pop at the Washington Post there. It’s really a fundamental challenge in journalism – how do you get people to care about what’s happening to people far, far away who, to all intents and purposes, are nothing like them? In the economics of news agendas, proximity is factor. You need more deaths the further away from market you are if you want to work your way up a running order.   Paul Bradshaw wrote about that a while back, pondering the scale at which we can lose the human in the statistic. The Ampp3d piece plays on this nicely – 23 players for 23 lives; 23 people you can relate to for 23 you can’t.  It adds context for the audience.

The challenge to convert the ‘impact of the scroll’ into concern for the people that whizz by. As I say, it’s not a new challenge, but one that is perhaps more of a challenge when you’re competing for eyes and the kind of content consumption that creates  opportunities in the scroll in the first place.

The socially shareable nature of graphics like the Wapo one can’t be underestimated in their impact. Already people are using it to lobby sponsors on Twitter. So it’ll be interesting to see what impact the Wapo version of the chart (and the story) has this time around with the added FIFA element for context.


Hyperlocal. open data. journalism conference: If you’ll come, I’ll build it.

Update: I’ve decided on dates. 6th and 7th of November 2015 in Preston. If you’re interested you can tell me what you’d like to see through a quick survey

As part of my current research activity into hyperlocal and open data, I’m finding myself at events talking to a lot of people in open data circles and a lot of people in Hyperlocal circles. But more often than not they aren’t the same events.

I know there are lots of moves to get hyperlocal’s interested in data journalism (especially with the election fresh in peoples mind). Likewise I know that a lot of open data people are also committed to (or ideologically disposed to ) the transparency and accountability aspects of journalism.

So, finding myself with some resources (that does mean money), I thought it would be fun to get some people in the same room.

So if you’re a hyperlocal, open (government) data person, journalist or local government person involved in data, would you come round to my place for a mini-conference on Making hyperlocal data journalism?

I have some ideas for what we could do…

  • Some open training sessions in data for hyerplocals
  • Some awareness raising from government people about what’s happening at a local level in terms of data
  • Business models for hyperlocal data
  • Best practice for working together to build data communities at a local level.
  • can (and should) government tell stories with data

…but I know loads of people are doing some or all of these things already so if nothing else,  it may just serve as a chance to get together and share this stuff with a hyperlocal focus.

At this stage I’d love to know if you’d be interested. What would you like to see? What format should it take? Who would you like to see speak or be in the room?

Initially I was thinking about a day or two at the end of August (maybe beginning of September but don’t want to clash with this event in Cardiff). But it could be later if people thought that was better. It would be in Preston.

Let me know in the comments below what, who and when to get the ball rolling.

Credit: Sofa picture Creative commons by net_efekt via Flickr

The connected middle class: Ofcom and civic internet use

OFCOM have released their Adults’ media use and attitudes report for 2015. It’s a report that is always worth a read through. This is the ten year anniversary edition with a good deal of the content around the release reflecting changes since 2005.

As you may have guessed from recent posts, I’ve got my head in open data stuff at the moment.  My focus is on hyperlocal use and the use of, for want of a better term, open government data. So that’s focused my first glance read through.

A few general things struck me. One was how media and internet mean the same thing in this report; 10 mentions of newspaper compared to 119 of social media. The lack of any mention of LocalTV also struck me as odd. I know it’s not strictly what the report was about but given the role of OFCOM in this and the apparent purpose of Local TV I’d have thought it would have been worth putting it in context.

Anyway, predictably, its the data on platform use, mobile in particular, is getting lots of attention. But, given my current focus, the bit that really peaked my interest was section 5.9 Accessing public or civic services . Here’s the intro:

Internet users are more likely than in 2013 to have ever gone online for all public/ civic activities, and a higher proportion have completed government processes in the last three months

Out of the thirty two individual online activities that internet users were asked about six activities that can be grouped under the heading of public or civic services. These are:

1. Find information about public services provided by local or national government

2. Complete government processes online – such as register for tax credits, renew driving licence, car tax or passport, complete tax return

3. Look at websites/ apps for news about or events in the local area/ the local community

4. Look at political/ campaign/ issues websites

5. Sign an online petition

6. Contact a local councillor or your MP online

Number 3 bodes well for hyperlocal, apparently 69% of those asked used websites/ apps for news about or events in the local area/ the local community; the biggest percentage point rise in any of the activities listed. But in general, everything is on the rise.


Where things get a little less inspiring is when that usage gets broken down by age and demographic group


The significant differences for 16-24 and 65+ makes for disturbing reading when it comes to engaging online. As do the lower socio-economic group figures.



Given the new governments view on moving public services online and their approach to supporting those without connectivity, the trends worry me. I’m really sensing a  ‘digital divide’ here especially given that OFCOM  note that of the 14% (a figure unchanged since 2013) of non-users of the internet, six in ten are aged 65+ and half are from DE households.

It’s not that people aren’t using the services but I don’t think I’m guilty of any conflation when I say the level of engagement of the middle-class connected makes it likely that they are the ones who will be most engaged with.

Time will tell.

Main image Mobile futures ©NYC Media Lab via Flickr CC BY-SA

Open data: What can we expect from the conservatives?

We have a conservative government in the UK for the next five years.  I’ve been looking at open data a lot for some research that I’m doing so I wanted to cast a proper eye over their manifesto to see what was on the cards.

I’m not the first to do this. The ODI, among others  did a good job of collating the open data related promises in the party manifestos. Charles Arthur at the Guardian covered similar ground in his ‘technology’ reading of each parties manifesto.

But, now we know who we are dealing with over the next few years I wanted to take a look and get my thoughts down. But it worth noting what open data means. You can take two definitions and get the gist.

The Open Knowledge Foundation has a pretty precise definition of what open data is:

“Open data and content can be freely used, modified, and shared by anyone for any purpose

In 2013 the G8 made explicit a commitment to open data in a charter which stated that open data sits at the heart of a revolution in communication technology that has the:

enormous potential to create more accountable, efficient, responsive, and effective governments and businesses, and to spur economic growth.

Reading between those definitions and their extended narrative you get a sense of two distinct themes  – accountability and innovation.  The two don’t always play well together but often go hand-in-hand e.g. better access to government spending means more transparency. Using that data for improving public services means innovation.

It really comes down to an issue of who is using it and for what.

So, with that in mind (and now you know what’s informing my thinking) I worked my way, as others have done,  through the manifesto looking for keywords ‘data’, ‘open data’ and ‘publish’.

Let’s start with what they will publish?:

We will require companies with more than 250 employees to publish the difference between the average pay of their male and female employees (p19)

Interesting data but already a requirement that was part of the Small Business Bill put through in March this year. Ironically it was an amendment by the Lib Dems.

We will publish more earnings and destination data for Further Education courses, and require more accreditation of courses by employers.(p35)

It’s tricky to unpick this one. There have been consultations on more data for adult learners in Further Education and the Skills funding (SFA) had already made reporting of more data around final destinations mandatory, although this was not for all learners. It will be interesting to see how this develops.

It’s interesting to see this on the same page as a pledge to require universities to make

more data to be openly available to potential students so that they can make decisions informed by the career paths of past graduates

There’s already a good deal of data kicking around of students perceptions of universities (see NSS). It feels like a lighter touch than the Further Education demands but that’s not a surprise given the more commercial footing the Higher Education sector is taking (and it’s big international market).

We will publish standards, performance data and a ranking system for the security of smartphones and tablets, as well as online financial and retail services. (p59)

This is an interesting one. This one seems to be read in the context of information security. I guess if we are going to be more data driven, we need to be more savvy about what is happening to our data. It’ll be interesting to see how this one pans out. Could we see a security rating on phones like the Euro NCAP for cars?

Of course information security concerns are often cyber crime concerns.  It’s already been touted that the Conservatives see the majority as a chance to push through the ‘Snoopers Charter’.  It’s the other side of the data coin in their manifesto:

We will keep up to date the ability of the police and security services to access communications data – the ‘who, where, when and how’ of a communication, but not its content. Our new communications data legislation will strengthen our ability to disrupt terrorist plots, criminal networks and organised child grooming gangs, even as technology develops (p63)

Don’t worry though. If you live in the SW then there’s a promise of investment to support the cyber security industry there (p11). And if you a criminal, they would let the police keep more of your assets, which is how they may fund ‘Cyber Specials’ to police it all (p59).

That’s it in terms of a direct statement to ‘publish’. That doesn’t mean to say that there aren’t plans for more data and related areas.

We will boost transparency even further, ensuring you can access full information about the safety record of your hospital and other NHS or independent providers, and give patients greater choice over where and how they receive care.

Not a commitment to publish but a commitment to make full information accessible. I’m sure negotiation on what this will mean  and what form it takes will be massively political.  This clearly falls into the service reform agenda of open data (and government) but it could be rich pickings for those looking for a boom in the kind of health consumer apps that Nigel Shadbolt talked about in his closing keynote to the ODI summit last year.

What level of control you have over that, the private vs public vs open debate, isn’t dealt with in any depth. It’s going to be opt in and who it’s shared with…

We will give you full access to your own electronic health records, while retaining your right to opt-out of your records being shared electronically (p38)

Of course, the private provider they pick to manage the infrastructure will have had the cyber security standards checked and published.

Moving online is seen as a big part of the process of reducing government spending. But what about those people not online? (Later: which given what ofcom say in their media use report, should still be a concern) Don’t worry

We will ensure digital assistance is always available for those who are not online, while rolling out cross-government technology platforms to cut costs and improve productivity – such as GOV.UK (p49)

That doesn’t mean more Barclay’s Digital Eagles :

We will help public libraries to support local communities by providing free wifi. And we will assist them in embracing the digital age by working with them (p41)

Good job too as all the books will be going digital

We will assist them in embracing the digital age by working with them to ensure remote access to e-books, without charge and with appropriate compensation for authors that enhances the Public Lending Right scheme. (p42)

So more digital access – and an interesting (re)negotiation with publishers which, like the gender pay data seems to have happened already.

But an expectation that Libraries will need to rely on volunteers to run, as Ed Vaizey told Library professionals in a statement:

Many libraries have also been able to attract large numbers of volunteers who are helping to run and provide services to users. It is precisely this sort of collaboration and innovation that libraries need to be considering as they look to attract more visitors and remain relevant.


Of course the solution is to make sure everyone has better access to the web (of course it is) so the government is pledging that 95% of the country will have superfast broadband access by 2017.

And who will pay for that, you will, through the licence fee?

we will continue to ‘topslice’ the licence fee for digital infrastructure to support superfast broadband across the country.

But they want more broadband power:

We will also release more spectrum from public sector use to allow greater private sector access. And we have set an ambition that ultrafast broadband should be available to nearly all UK premises as soon as practicable (p15)


Ultrafast! All good news and the release of the spectrum would seem to underpin the aim to make the UK ” a world leader in the development of 5G”.  But making the spectrum more commercial without any nod to the pricing of data (key to access for many) other than, I’m assuming the implicit assumption that the market will drive prices down, seems naive.


There are lots of other things in there that are related but this is what stood out for me.

  • Commitments to publish data were minimal or has already been done.
  • Commitments to data sits in the service reform rather than transparency agenda
  • Data is a market driven proposition

So there’s an underpinning of the infrastructure and economic environment that will mean open data and data economies will have plenty to go at. But as a citizen, looking at a neo-liberal market approach to data for the next five years, I’m feeling in an odd place.

Years ago when the free our data campaign asked government to give up what we had already paid for, it made sense. Now I see an economy slowly building around open data, and more specifically open government data, and I’m wondering whether I should be looking at those companies through Catapults and the like and asking a similar question.


Open data overload:yorkopendata and #localdata15

For the last week it’s feels like my life has been all about opendata.

My immersion into the world of open data that started last Friday with my trip up to Leeds for the DataDive, continued with two events driven by open data.

Launching York Open Data
Launching York Open Data

Monday, and I was in sunny York (very nice place to set up office for the day) for the launch of York City Council’s YorkOpenData portal. York, and the localTV station Hello York, are part of the Media Mill project I’m research partner for.

Ian Cunningham, Group Manager of the Council’s Shared Intelligence Bureau, introduced the session, with an invite for everyone to pitch in and help them understand what data would be useful to ‘open’. It came with an honest (and open) assessment of the realities of dealing with open data – the phrases ‘we can’t do it all’ and ‘we can’t be all things’ were common. But regardless of the cautiousness, events like this and honest intentions are important in starting to build the networks you need to make open data work.



The next day, I heard that message echoed around the conference room at the Museum of science and industry in Manchester. I was there for the Local open data: Reaping the benefits event.

The event, organised by opendata platform people Swirrl (a round up of tips from the event is on their blog), saw a range of local government themed presentations sharing best practice and ideas around building value in open data.

If there was a key theme that emerged from the event, it was transparency. More specifically the problems that the focus on the transparency agenda causes open data.

Part of it is practical – servicing the central government demand for Key Service Performance Indicators is resource heavy. Open data is seen as a time consuming extra when you’ve got a council to run. It’s an interpretation of transparency that also seems to hoist a lot of idealogical baggage onto open data – Transparency and open data are often synonymous in much of the rhetoric. So the emphasis here was about changing the message to one of the value to service reform and improvement.

It was a tension that Lucy Knight from Devon County Council  unpicked in her presentation ‘gently lampooning’ attitudes to open data in councils.  She stressed the value of understanding the user need for open data (you can help her generate user cases) rather than getting caught up in the “cargo cult” of data dashboards.

Thinking about how transparency can drive thinking,  it was interesting to hear from a number of projects from Scotland including hackevents and the Scottish Cities Alliance and their approaches to open data.

Ian Watt from Aberdeen’s codethecity project noted that Scotland’s councils don’t have the same pressures/demands for data transparency. It means that open data in Scotland doesn’t necessarily have the same idealogical issues. I think that will make for some tantalising opportunities for researchers looking at comparative models for open data.

Still early days 

A lot of people in the room where there at the start of the journey in open data. One delegate I spoke to was in the second day of his job as an open data lead and he already felt he was playing catch-up.

It speaks volumes of a phenomenon I see a lot of events like this; people always assume the area is more established than it is. The truth is that the diverse nature of local government means this stuff is just starting to trickle down and make a mark. As Mark Braggin’s (formally of Hampshire’s data hub and now one of the organisers of open data camp), taking an agricultural metaphor, reflected: ‘perhaps we can say we are moving from the hand reaping of data to a steam age’.

So it was no surprise then that much the most animated sessions (and the people most cornered at lunch!) where those with success stories.

Jamie Wyte talks about open data and his failed attempt to get a dogpoobinselfie meme started
Jamie Wyte talks about open data and his failed attempt to get a dogpoobinselfie meme started

Jamie Whyte from Trafford Council talked about how the Trafford Innovation and Intelligence Lab started and has developed. It was a great presentation that showed how innovation can grow within an organisation and reflected something of a coup for Jamie, who seems to have built something robust and innovative inside a council – something many people would dismiss as impossible.

Mark Braggins talks about the steady growth in local government data stores


In his presentation, Mark Braggins talked about the broader issues around open data drawing on his experience helping start Hampshire’s data hub. The Hampshire hub seems to be popping up a lot in local government data themed events and it shows how important engagement with communities outside the council (something Mark thought was vital) is.  The range of initiatives ‘seeded’ by the hub shows how well that’s worked.

When I talked to Mark, he stressed how much effort went into understanding the motivations of those within the council when speccing the original project (you can see the full business case on their site). By keeping on message and keeping costs down it became an easy sell – engagement was the cherry on the cake.

Comparing the experiences of the Trafford Innovation and Intelligence Lab and Hampshire hub I was struck by the direction of travel each was taking. Hampshire going out to by default and Trafford building internal capacity and then reaching out. Both open approaches and both working.

For a single day, it was a really rich collection of presentations and good conversation.


Looking back over what is essentially a week of nothing but open data, I think I feel a little more confident in my understanding of the reality of the approaches (and problems) of selling open data as a value proposition rather than an ideological standpoint. My impression is that it’s essentially a pragmatic one – which makes sense.

But, there’s part of me that worries about the democratic affordances of this stuff. As the transparency agenda takes a back seat (or is sidelined) there is a danger that accountability suffers.  For the pragmatism that pervades local government to work you need an equally pragmatic citizens. Part of the promise of open data is that can help individuals become that. That’s great for those who can (and have the resources) but for those that can’t we need an accountable, deliberative democracy.  For me that’s more about reaching out and being more transparent to the community, telling better stories about what you do, rather than asking them to adjust their expectations.

Reflecting on the Leeds #Datadive

Last Friday night,  I found myself in a sun filled loft workshop in Leeds. All the people in the room seemed to be in one corner, but that’s where the (free) bar was.  Tables are set out in rows. Solid wood and rubber topped refugees from the re-fit of Birmingham library. They are already filled with laptops.

This loft space belongs to the ODI’s node in Leeds The laptops belong to data scientists but the people are a mix of the data savvy,  local and national charities. All here for the first, it’s hoped of many, DataDives.

The event was organised by DatakindUK, a chapter of the US group Datakind who “create teams of pro bono data scientists” to work with organisations to solve problems. Local charities are invited to pitch requests for help. If selected they provide data which ,in the run up to the event is cleaned up by data heroes, ready to be pitched at the start of the weekend.  Local organisations also pitched in data. Leeds City Council and their DataMill, for example, had offered up data to use.

So, after beer and chat, the three charities pitched their problems.


  • Volition, representing a large network of mental health organisations in Leeds, had a common problem. Lots of information about the organisations and their work (literally a database of the stuff) but wanted to link it with data about mental health issues in Leeds.
  • Voluntary action Leeds had stacks of interviews with young people, exploring the issue of being a NEET (not in employment, education or training). They wanted a way to sift the text to look for common themes and also wondered if there was a way of detecting unknown Neets in existing demographic data.
  • The Young Foundation (who also co-sponsored the event) have recently set up a new project in Leeds gathering information around financial exclusion. The project, part of a broader range of projects Leeds are running, looks at the growth of loans, payday lenders etc. They wanted to surface data around the issue.

The rest of the evening was a kind of slow-speed-date where the volunteers in the room pitched themselves and their skills and where wooed by the charities. Eventually splitting into teams to get to work on the Saturday and Sunday.


Datakind are an interesting organisation and new one on me.  They are clearly very much at the altruistic end of the hackathon/datalab movement. Their founder is Jake Porway who used to work for the R&D lab at The New York Times (it seems you’re never far away from journalism!). He told Wired that he wanted more from the data boom that was happening around him: “the things that people would do with it seemed so frivolous — they would build apps to help them park their car or find a local bar. I just thought, ‘This is crazy, we need to do something more.'”

That more isn’t just the pro-bono aspect – free data scientists.  The Datakind people in the room are also there to pass on skills to the organisations.

It was great to see the charities getting excited about the possibilities of everything from simple tools like Wordle to more complex text analysis software and maps.

Sunday afternoon and it was time to show and tell.

The end results where a real mix of the complex – synthetic personality types for identifying the financially excluded – to simple infographics. But there was real impact in the data on the people in the room perhaps best exemplified by the debate and discussion that was generated by an extra mapping project that sprang up during the weekend.

They simply took the datasets each group were finding/generating and mapped them. Technically, not that much of a challenge (except for a tricky issue with local government boundaries) but the insights where immediate.


Where is the value.

When I spoke to representatives of the charities, there was a general feeling that data was important. They all recognise that the third-sector is fast becoming data-driven. But beyond the process of writing reports or bids, the real value of data was still to be explored and understood. It just feels important.

The complexities of the third-sector ecosystem don’t help when it comes to raising awareness of events like this though. Even when free help, and experience is on hand.

When I asked people about how they found themselves at the event, it revealed a complex web of umbrella groups, agencies and initiatives – understanding that would need a datadive in itself!  The organisers where similarly challenged; pulling the event together had proved a slower and more complicated process compared to their London datadives.

Good people. Good work.

After the ODI summit last year, I found myself reflecting on the difficult line there is between the power to do good and the power to do business that data provides and after the event I found myself chatting  through similar issues with Paul Connell, one of the founders of the ODI Leeds node. He was pragmatic about the challenges; balancing the urge to do good with the urge to create the new Uber. A tension that often makes hack events tricky spaces.  So, with my research hat on,  its tempting to start try and unpick the motivations of activities like this beyond the desire to give those people involved “the warm fuzzies” as Datakind put it on their homepage.

But the vibe at the Leeds Datadive event really did make it feel impervious to scepticism.  The results, rough round the edges as they were, felt ‘useful’.

As an example: One of the teams, analysing data around NEETS, looked at sanctions imposed on young jobseekers (the stop in benefits that’s imposed if you don’t tow the line with your employment service).  Sanctions vary, but you can get 4 weeks ‘ban’ for missing an appointment.  Mapping the data seemed to make a compelling point – the most sanctions were applied to people who live furthest away from the job centres. That peaked a fair bit of interest from journos in my feed (even on a Sunday morning).

Whether further analysis proves that or, more likely, reveals the finer detail, is moot. In a short space of time, simple but no less surprising truths about the experiences of people in Leeds were revealed.

DatakindUK hope this is going to be the first of many events outside of London and I’d make a point of tracking them down next time.