Two fundamentals that define good data journalism

Defining data journalism is a hostage to fortune but as I start teaching a data journalism module I’ve boiled it down to two things visible methodology and data.

I’m teaching a module on Data Journalism to second year undergraduates this year. It’s not the first time we’ve done that at the university. A few years ago three colleagues of mine, Francois Nel, Megan Knight and Mark Porter ran a data journalism module which worked in partnership with the local paper. I’ve also been tormenting the students with elements of data journalism and computational journalism across all four years of our journalism courses.

There are a couple of things I wanted to do specifically with this data journalism module (over and above the required aims and outcomes). The first thing was, right from the start, to frame data journalism as very much a ‘live conversation’. It’s exciting, and rare these days, that students can dive into a area of journalism and not feel they are treading on the toes of an existing conversation. The second thing was to try and get them thinking about the ideological underpinnings of data journalism.

Data journalism as a discourse borrows most heavily and liberally from the vocational underpinnings of journalism — the demand of journalism to serve the public and hold to account that John Snow and others have talked about. But it also draws on the rigour of science, the discipline of code, design thinking, narrative and social change; anything to bring shape, structure and identity. This is often a good thing, especially for journalism, where new ideas are few and far between and it takes a lot to challenge the orthodoxy. Perhaps that’s why data journalism is seen as an indicator for prosperous media companies. But it’s also a bad thing when it’s done uncritically. I’ve written lots about how I think data journalism borrows the concept of open for its own purposes for example. Often much of the value of data journalism seems implied.

The fluid nature of data journalism discussion makes it difficult to identify “schools “of data journalism thought — I don’t think there’s a bloomsbury group of data journalism yet!*- but there are attempts to codify it. Perhaps the most recent (and best) is Paul Bradshaw’s look at 10 principles for data journalism in its second decade. It’s a set of principles I can get behind 100% and it’s a great starting point for the ideological discussion I want the students to have.

That said, and pondering this as I put together teaching materials, I think things could be a little simpler — especially as we begin to identify and analyse good data journalism. So if there was a digitaldickinson school of data journalism I think there would be a simple defining idea…

If you can’t see, understand and ideally, interact with either of those in the piece, it may be good journalism but it’s not good data journalism.

When good journalism becomes good data journalism

Here’s two examples to make the point.

The Guardian published a piece uses Home Office data to reveal that the asylum seekers are being housed by some of the poorest councils in the UK. A story that rightly caught the eye of Government and campaigners alike. Exceptional journalism. Poor data journalism.

An exceptional piece of investigation, great journalism but this would score low as a piece of data journalism

The problem with the piece is that, although it relies heavily on the data used it is light on the method and even lighter on the underpinning data. The data it uses is all public (there is no FOI mentioned here) and there isn’t even a link to the source let alone the source data.

Contrast that with a piece from the BBC looking at the dominance of male acts at festivals. 

The BBC’s piece might be seen as frivolous, but no less a piece of journalism.

An introduction to the method ticks the boxes for me.

It’s a fascinating piece but the key bit for me is at the end where there is a link to find out how the story was put togetherThat’s the think that makes this great data journalism.The link takes you to a github repository for the story which includes more about the method, unpublished extras and, importantly, the raw data.

The BBC England Data Unit GitHub page is a good example of how to add value to data journalism stories.

The BBC take is a full-service, all bases covered example of good data journalism; its the blue ray with special features version of the article. To be fair to the Guardian piece, they do talk a little about the ‘how’. But not on the level of the I also recognise that in these days of tight resources, not every newsroom needs to create this level of detail. But using github to store the data or even just linking to the data direct from the article is a step in the right direction — its often what the journalists would have done anyway as part of the process of putting the article together.

Making a point

I’ve picked the Guardian and BBC stories here as examples of data-driven journalism. These are two stories that put data analysis front and centre in the story. But I recognise that I’m the one calling them ‘data journalism’. I’m making a comparison to prove a point of course, but my ‘method’ aside, the point I think stands — beyond the motivations, aims and underpinning critical reasons, when the audience access the piece, without the method and the data can we really say its data journalism.

I want my data journalism students to really think about why we see data journalism as a thing that is worthy of study not just practice. Not in a fussy academic way but in a very live way. It isn’t enough to judge what is produced by the standards of journalism alone (I’m guessing the Guardian piece would tick the ‘proper journalism’ box for many). But it isn’t ‘just journalism’ and it isn’t just a process. If the underlying principles and process aren’t obvious in the content that the readers engage with, then it’s just an internal conversation. It has to be more than that.

For me ,right now, outside of the conversation, good journalism starts with a visible method and data.

*I guess if there was they would vehemently deny there was one.