Is Data Journalism any more open?

Last year I wrote about how the 2016 Data Journalism awards illustrated that journalism hasn’t quite got to grips with the full meaning of open data. So I thought I’d take a look at this years crop and see if things had improved.

This is last years definition for the open data category:

Open data award [2016] Using freedom of information and/or other levers to make crucial databases open and accessible for re-use and for creating data-based stories.

This years was the same save for an addition at the end.(my emphasis)

Open data award [2017] Using freedom of information and/or other levers to make crucial datasets open and accessible for re-use and for creating data-driven journalism projects and stories. Publishing the data behind your project is a plus.

A plus! The Open Data Handbook definition would suggest it’s a bit more than a plus…

Open data is data that can be freely used, re-used and redistributed by anyone — subject only, at most, to the requirement to attribute and sharealike

…if you want people to re-use and re-distribute then people need the data.

Lets take a look at this years shortlisted entries and see how they do with respect to the open data definition.

So, in the order they appear on the shortlist…

Analyzing 8 million data from public speed limit detectors radars, El Confidencial, Spain

This project made use of Spain’s (relatively) new FOI laws to create “an unique PostgreSQL database” of traffic sanctions due to exceeding the speed limits. A lot of work behind the scenes then to analyse the results and a range of fascinating stories off the back of it. It’s a great way to kick the tyres of the legislation and they’ve made good use of it.

Most of the reporting takes the same form. The story is broken down into sections each accompanied by a chart. The charts are a mix of images and interactives. The interactive charts are delivered using a number of platforms including Quartz’s Atlas tool but the majority use DataWrapper. That means that the data behind the chart is usually available for download. Most of the heavy lifting for users to search for their area is done using TableauPublic which means that the data is also available for download. The interactive maps, made on Carto, are less open as there is no way to get at the data behind the story.

Verdict: Open(ish) — this makes good use of open government legislation to create the data, but is that really open data. The data in the stories is there for people to download but only for the visualisations. That’s not the whole data set. There also isn’t an indication of what you can do with the data. Is it free for you to use?

Database of Assets of Serbian Politicians, Crime and Corruption Reporting Network — KRIK, Serbia (this site won the award)

For their entry independent investigative journalism site KRIK created “the most comprehensive online database of assets of Serbian politicians, which currently consists of property cards of all ministers of Serbian government and all Serbian presidential candidates running in 2017 Elections.” Reading the submission it’s a substantial and impressive bit of work, pulling in sources as diverse as Lexis and the Facebook Graph. They even got in a certified real estate agency “which calculated the market values of every flat, house or piece of land owned by these politicians” Amazing stuff done in a difficult environment for journalism.

Verdict: Closed — This is a phenomenal act of data journalism and would in my view, been a deserving winner in any of the categories. But the data, whilst searchable and accessible and certainly available, isn’t open in the strict sense.

#MineAlert, Oxpeckers Investigative Environmental Journalism, South Africa

Using information access legislation and good old journalistic legwork, Oxpeckers Centre for Investigative Environmental Journalism pulled together a dataset of mine closure information that revealed the impact of a chaotic mining sector in South Africa. The data highlighted the number of derelict mines that hadn’t been officially closed and were now being illegally and dangerously mined. There’s a nice multimedia presentation to the story and the data is presented as an embedded Excel spreadsheet.

The project has been developed and supported by a number or organisations including Code for Africa. It’s no surprise then that the code behind parts of the project via github. The data itself is also available through the OpenAfrica data portal where the licence for reuse is clear.

Verdict: Open. The use of github and the OpenAfrica data portal add to the availability of the data which is clearly accessible in the piece too.

Pajhwok Afghan News, Afghanistan

Independent news agency Pajhwok Afghan News have created a data journalism ‘sub-site’ that aims to “use data to measure the causes, impact and solutions driving news on elections, security, health, reconstruction, economic development, social issues and government in Afghanistan.”

The site itself offers a range of stories and a mix of tools. Infogr.am plays a big part in the example offered in the submission. But other stories make use of Carto and Tableau Public. The story “Afghan women have more say in money that they earned themselves than property in marriage” uses Tableau a lot and that means the data is easy to download, including the maps. That’s handy as the report the piece is based on (which is linked) is only available as a PDF

Verdict: Open(ish) — the use of Infogr.am as the main driver for visualisation does limit the availability of the data, but the use of Tableau and Carto do raise the barriers a little.

ProPublica Data Store, ProPublica, United States

The not-for-profit investigate journalism giant Pro-Publica have submitted a whole site. A portal for the data behind the stories they create Interestingly Pro-Publica also see this project as a “potential way to defray the costs of our data work by serving a market for commercial licenses.” that means that as a journalist you could pay $200 or more to access some of the data.

Verdict: Open. Purists might argue that the paywall isn’t open and ideally it would be nice to see more of the data available and then the service and analysis stuff on top rather than the whole datasets being tied up. That said, its not like ProPublica are not doing good work with the money.

Researchers bet on mass medication to wipe out malaria in L Victoria Region, Nation Media Group, Kenya

This piece published by The Business Daily looks at plans to enact a malaria eradication plan in Lake Victoria region. The piece takes data from the 2015 Kenya Malaria Indicator Survey amongst other places to assess the impact of plans to try and eradicate the disease.

Verdict: Closed. The work done to get the data out of the reports (lots of pdf) and visualise it is great and its a massively important topic. But the data isn’t really available beyond the visualisations.

What’s open?

Like last year it’s a patchy affair when it comes to surfacing data. Only two of the entries make their data open in a way that sits comfortably in the definition of open data. For the majority, the focus here is on using open government mechanisms to generate data and that’s not open data.

As noted last year, what open data journalism should be, is really about where you put the pipe;

  • open| data journalism — data journalism done in an open way.
  • open data | journalism — journalism done with open data.

By either definition, this year’s crop are better representative of open data use but fall short of an ‘open’ ethos that sits at the heart of open data.

Does it matter?

I asked the same question last year; In the end, does the fact that the data isn’t available make the journalism bad? Of course not. The winner, KRIKS is an outstanding piece of journalism and there’s loads to learn from the process and thinking behind all the projects. But I do think that the quality of the journalism could be reinforced by making the data available. After all, isn’t that the modern reading of data journalism? Doesn’t making our working out and raw data more visible build trust as well as meaning?

Ironically perhaps, Pro-Publica highlights the problem in the submission for their data store project —

“Across the industry, the data we create as an input into our journalism has always been of great value, but after publication it typically remained locked up on the hard drives of our data journalists — of no use either to other journalists, or to anybody else who might find value in it.”

Publishing the data behind your project is what makes it open.

If you think I’m being picky, I’d point out that I’m not picking these at random. This is the shortlist for the open data category. These are what the judges (and the applicants) say are representative of open data. I think they could go further.

As I’ve noted before, if the practice of data journalism is to deliver on transparency and openness, then it needs to be part of that process. It needs to be open too. For me I’d like to see the “Publishing the data behind your project is a plus” changed for next year to an essential criteria.

Hyperlocal. open data. journalism conference: If you’ll come, I’ll build it.

Update: I’ve decided on dates. 6th and 7th of November 2015 in Preston. If you’re interested you can tell me what you’d like to see through a quick survey

As part of my current research activity into hyperlocal and open data, I’m finding myself at events talking to a lot of people in open data circles and a lot of people in Hyperlocal circles. But more often than not they aren’t the same events.

I know there are lots of moves to get hyperlocal’s interested in data journalism (especially with the election fresh in peoples mind). Likewise I know that a lot of open data people are also committed to (or ideologically disposed to ) the transparency and accountability aspects of journalism.

So, finding myself with some resources (that does mean money), I thought it would be fun to get some people in the same room.

So if you’re a hyperlocal, open (government) data person, journalist or local government person involved in data, would you come round to my place for a mini-conference on Making hyperlocal data journalism?

I have some ideas for what we could do…

  • Some open training sessions in data for hyerplocals
  • Some awareness raising from government people about what’s happening at a local level in terms of data
  • Business models for hyperlocal data
  • Best practice for working together to build data communities at a local level.
  • can (and should) government tell stories with data

…but I know loads of people are doing some or all of these things already so if nothing else,  it may just serve as a chance to get together and share this stuff with a hyperlocal focus.

At this stage I’d love to know if you’d be interested. What would you like to see? What format should it take? Who would you like to see speak or be in the room?

Initially I was thinking about a day or two at the end of August (maybe beginning of September but don’t want to clash with this event in Cardiff). But it could be later if people thought that was better. It would be in Preston.

Let me know in the comments below what, who and when to get the ball rolling.

Credit: Sofa picture Creative commons by net_efekt via Flickr

When is data journalism not data journalism?

When it’s data driven journalism….

I’m doing lit-review at the moment (this might sound academic but it essentially consists of me yellow-highlighter-penning-the-feck out of papers and journal articles) and I came across a little loop in defining data journalism that got me thinking, thanks to Wikipedia.

Look at wikipedia’s definition for data journalism and you before you begin you’re told:

Not to be confused with Data driven journalism

Look at data driven journalism and you’re told:

Not to be confused with  Data journalism

Oh and don’t even think about confusing either of them for database journalism.

Reading the definitions there’s a hint of why. Data driven journalism is one process of the broader practice of  Data journalism. Data journalism reaches outside of journalism to encompass data science and designers.

Does that mean that I can say that if I come from the school of thought that wants to play down (or distance myself) from the idea that data journalism is about output – visualization – that I do data driven journalism? Does the difference speak to philosophical/professional position?

Just get on with it?

In one sense I don’t have a problem with the distinction – it makes a kind of sense. I’m also sure many others won’t, dismissing it with the weary sigh that prefixes  ‘what does it matter what we call it, lets just do it’. 

As an observation, I have to say it’s stuff like this that really needs nailing down if data journalism (or whatever you call it) wants to be left alone just to get on with it.

One of the research papers I’ve read (it’s a great paper btw) suggests, is that “at least part of what is considered as forming the contemporary trend of data journalism mainly operates in the realm of discourse”.  In other words the idea of data journalism is not fixed.

One reading of that is that its a developing field and in that there is bound to be an element of evolution (in the Darwinian sense). Look at the wikipedia page for Computer assisted reporting:

It has been suggested that this article be merged with Data driven journalism. (Discuss) Proposed since October 2011.

You could argue that conceptually (in the minds of those just doing it) this has already happened. The CAR page, like many others on Wikipedia, will serve as much as an archive for the term, reflecting that, at one point, it was considered coherent enough of a thing to warrant it’s own page.  USeful for me as an academic but redundant going forward.

But you could also read it as making it up as we go along – that’s not very precision is it.

 

Doing data in a journalism course

It’s a subject that isn’t going away and it’s also one that generate a huge amount of debate – data journalism. If ever there was a perfect hook to hang all of journalisms best and worst it’s data journalism! But a recent flurry of tweets and a nice ‘there’s no reason not to try this stuff’ post from Matt Waite focussed on one part of the debate – how should we be doing more of this in our j-courses and who should be doing it at.

It was something that Matt kicked off with a tweet:

Quite a few people pitched in (an assortment of tweets below):


There is an interesting point in there about adjunct courses – essentially but not exclusively online courses – which I think is fair. There’s no better way to  put journalists (and students) off than combining maths and computers!

As I said in my response, we do ‘data’ across all of our courses and I thought I’d share an example of the kind of intro practical stuff we are doing with first years (year one of three year degree). It’s done in the context of a broader intro to data and journalism and it’s developed and expanded throughout the three years (more so as we are shifting things around in the courses.) including a dedicated data journalism module.

My take at this stage is that data journalism is worth considering as part of a more structured approach to journalism. The students are no doubt fed up of my Process into content mantra.

Anyway. Two slideshows below are an intro – context lecture and the other is the related workshop. And, yes, I know there is a fair bit of visualization in there – charts and maps – which some data people can get quite sniffy about. We are careful to make the point that not all data is visual but I do think a visual output can be a quick win for capturing peoples interest. It’s just the start.

Again, these are just the slides, there is the usual amount of narrative and discussion that goes with this. They are presented as is:


 
Let me know what you think if you get a chance.

Data journalism: Making it real

The following is an edited version of a chapter I contributed to  a new book Data Journalism: Mapping the Future, published  by Abramis academic publishing. The fact that I’m in it aside, I can heartily recommend it as a great mix of practical and contextual information. Go and buy one. Go on!

During the 2008 summer Olympics, the Beijing Air Track project took a team of photographers from Associated Press and used them to smuggle hand-held pollution sensors in to Beijing. Using their press access to the Olympic venues, they gathered pollution readings to test the Chinese government’s data that a series of extreme emergency measures put in place in the run-up to the games had improved the cities notoriously poor air quality. They were not the only organisation to use sensors in this way. The BBC’s Beijing office also used a hand-held sensor to test air pollution gathering data that appeared in a number of reports during the games.

beijing-air-quality-days-compared
AP’s interactive report visualised the level of air pollution

Clean air. Clean data
The Air Track project and AP’s interactive report  are now cited as a:

 “prime example of how sensors, data journalism, and old-fashioned, on-the-ground reporting can be combined to shine a new level of accountability on official reports”.

In contrast to the Chinese data, the level of transparency displayed in the way the data was collected vividly illustrates how sensors can play a part in reinforcing data journalism role in the process of accountability.

Testing the context, provenance and ownership – where our data comes from and why – is a fundamental part of the data journalism process. If we are not critical of the data we use (and those that provide it), perhaps becoming over-reliant on data press releases , we can risk undermining our credibility with data-churnalism or, worse still, data-porn! . As data journalism practice evolves, whilst the basic critical skills will remain fundamental, it would seem logical to explore ways that we reduce our dependency on other sources all together. The Beijing project, with its use of sensors, offers a compelling solution. As Javaun Moradi, product manager for NPR digital, succinctly put it:

“If stage 1 of data journalism was ‘find and scrape data.’, then stage 2 was ‘ask government agencies to release data’ in easy to use formats. Stage 3 is going to be ‘make your own data’”

Crowdsensing data
The three stages that Moradi identifies are not mutually exclusive. Many data journalism projects already include an element of gathering new data often done using traditional forms of crowdsourcing; questionnaires or polls. As much as involving the audience has its benefits, it is notoriously unpredictable and time-consuming. But as individuals we already make a huge amount of data. That isn’t just data about us collected by others through a swipe of a loyalty card or by submitting a tax return online. It’s also data we collect about ourselves and the world around us.

An increasing number of us strap sensors to ourselves that track our health and exercise and the “internet of things”  is creating a growing source of data from the buildings and objects around us. The sensors used by the AP team were specialist air pollution sensors that cost in excess of $400 – an expensive way for cash-strapped newsrooms to counter dodgy data.  Since 2008 however, the price has dropped and the growing availability of cheap computing devices such as Raspberry Pi and Arduino and the collaborative and open source ethic of the hacker and maker communities, have lowered the barriers to entry. Now sensors, and the crowd they attract, are a serious option for developing data driven reporting.

Hunting for (real) bugs with data
In 2013, New York braced itself for an invasion. Every 17 years a giant swarm of cicadas descend on the East Coast. The problem is that exactly when in the year the insects will appear is less predictable. The best indicator of the emergence of the mega-swarm (as many as a billion cicadas in a square mile) seems to be when the temperature eight inches below the ground reaches 64 degrees (18C). So when John Keefe, WNYC’s senior editor for data news and journalism technology, met with news teams to look at ways to cover the story, he thought of the tinkering he had done with Arduino’s and Raspberry Pi’s . He thought of sensors.

Keefe could not find a source for the data that offered any level of local detail across the whole of New York. He took the problem of how to collect the data to a local hackathon, organised by the stations popular science show Radiolab, who helped create a “recipe” for an affordable, easy to make temperature sensor which listeners could build and send results back to a website  where they would map the information

Developing collaboration.
Whilst sensors play an enabling role in both examples, underpinning both the Beijing AirTrack and Cicada projects is the idea of collaboration. The Beijing project was originally developed by a team from the Spatial Information Lab at Columbia University. Combining the access of the media with the academic process and expertise of the lab gave the project a much bigger reach and authority. It’s a form of institutional collaboration that echoes in a small way in more recent projects such as The Guardian’s 2012’s Reading the riots. The Cicada project, on the other hand, offers an insight into a kind of community-driven collaboration that reflects the broader trend of online networks and the dynamic way groups form.

Safecast and the Fukushima nuclear crisis
On 9 March 2011, Joichi Ito was in Cambridge Massachusetts. He had travelled from Japan for an interview to become head of MIT’s prestigious Media Lab. The same day a massive underwater earthquake off the coast of Japan caused a devastating tsunami and triggered a meltdown at the Fukushima Dai-ichi nuclear plant, starting the worst nuclear crisis since Chernobyl in 1986. Ito, like many others, turned to the web and social media to find out if family and friends were safe and gather as much information as he could about the risk from radiation

At the same time as Ito was searching for news about his family, US web developer Marcelino Alvarez was in Portland scouring the web for information about the possible impact of the radiation on the US’s west coast. He decided to channel his “paranoia” and within 72 hours his company had created RDTN.org, a website aggregating and mapping information about the level of radiation .

For Alvarez and Ito the hunt for information soon developed into an effort to source geiger counters to send to Japan. Within a week of the disaster, the two had been introduced and RDTN.org became part of project that would become Safecast.org. As demand outstripped supply, their efforts to buy geiger counters quickly transformed into a community driven effort to design and build cheap, accurate sensors that could deployed quickly to gather up to date information.

SIDENOTE: It will be interesting to see how the experiences of Beijing and Safecast could come together in the coverage of the 2020 Olympics in Japan

Solving problems: Useful data and Purposed conversations
Examples such as WNYC’s cicada project show how a strong base of community engagement can help enable data-driven projects. But the Safecast network was not planned, it grew

“from purposed conversations among friends to full time organization gradually over a period of time”

There was no news conference to decide the when and the how it would respond or attempt to target contributors. It was a complex, self-selecting, mix of different motivations and passions that coalesced into a coherent response to solve a problem. It’s a level of responsiveness and scale of coverage that news organisations would struggle to match on their own. In that context, Moradi believes that journalism has a different role to play:

Whether they know it or not, they do need an objective third party to validate their work and give it authenticity. News organisations are uniquely positioned to serve as ethical overseers, moderators between antagonistic parties, or facilitators of open public dialogue

Building bridges
Taking a position as a “bridge” between those with data and resources and the public who desperately want to understand the data and access it but need help  is a new reading of what many would recognise as a traditional part of journalism’s process and identity. The alignment of data journalism with the core principles of accountability and the purpose of investigative journalism, in particular, makes for a near perfect meeting point for the dynamic mix of like-minded hacks, academics and hackers, motivated not just by transparency and accountability. It also taps into a desire not just to highlight issues but begin to put in place solutions to problems. This mix of ideologies, as the WikiLeaks story shows , can be explosive but the output has proved invaluable in helping (re)establish the role of journalism in the digital space. Whether it is a catalyst to bring groups together, engage and amplify the work of others or a way, as Moradi puts it, to “advance the cause of journalism by means other than reporting” , sensor journalism seems to be an effective gateway to exploring these new opportunities

The digital divide
The rapid growth of data journalism has played a part in directing attention, and large sums of money, to projects that take abstract concepts like open government and make them tangible, relevant and useful to real live humans in our communities. It’s no surprise, then, that many of them take advantage of sensors and their associated communities to help build their resources. Innovative uses of smart phones, co-opting the internet of things or using crowd funded sensor project like the Air quality egg.  But a majority of the successful data projects funded by organisations such as the Knight Foundation, have outputs that are almost exclusively digital; apps or data dashboards. As much as they rely on the physical to gather data, the results remain resolutely trapped in the digital space.

As far back as 2009, the UK government’s Digital Britain report warned:

“We are at a tipping point in relation to the on-line world. It is moving from conferring advantage on those who are in it to conferring active disadvantage on those who are without”

The solution to this digital divide is to focus on getting those who are not online connected. As positive as this is, it’s a predictably technological deterministic solution to the problem that critics say conflates digital inclusion with social inclusion . For journalism, and data journalism in particular, it raises an interesting challenge to claims of “combating information asymmetry” and increasing the data literacy of their readers on a mass scale .

Insight journalism: Journalism as data
In the same year as Digital Britain report appeared, the Bespoke project dived into the digital divide by exploring ways to create real objects that could act as interfaces to the online world. The project took residents from the Callon and Fishwick areas in Preston, Lancashire, recognised as some of the most deprived areas in the UK, and trained them as community journalists who contributed to a “hyperlocal” newspaper that was distributed round the estate. The paper also served as a way of collecting “data” for designers who developed digitally connected objects aimed at solving problems identified by the journalists. A process the team dubbed insight journalism .

Wayfinder at St Matthew's church (Image copyright Garry Cook)
Wayfinder at St Matthew’s church (Image copyright Garry Cook)

One example, the Wayfinder, was a digital display and a moving arrow which users could text to point to events happening in the local area.

Bespoke's Viewpoint Contour Homes' office in Callon, Preston (c) Garry Cook
Bespoke’s Viewpoint Contour Homes’ office in Callon, Preston (c) Garry Cook

Another, Viewpoint was a kiosk, placed in local shops that allowed users to vote on questions from other residents, the council and other interested parties. The questioner had to agree that they would act on the responses they got, a promise that was scrutinised by the journalists.

The idea was developed during the 2012 Unbox festival in India, when a group of designers and journalists applied the model of insight journalism to the issue of sexual harassment on the streets of New Delhi. The solution, built on reports and information gathered by journalists, was to build a device that would sit on top of one of the many telegraph poles that clutter the streets attracting thousands of birds. The designers created a bird table fitted with a bell. When a woman felt threatened or was subjected to unwanted attention she could use Twitter to “tweet” the nearest bird table and a bell would ring. The ringing bell would scatter any roosting birds giving a visible sign of a problem in the area. The solution was as poetic as it was practical, highlighting not just the impact of the physical but the power of journalism as data to help solve a problem.

Stage four: Make data real
Despite its successes sensor journalism is still a developing area and it is not yet clear if it will see any growth beyond the environmental issues that drive many of the examples presented here. Like data journalism, much of the discussion around the field focuses on the new opportunities it presents. These often intersect with equally nascent but seductive ideas such as drone journalism. More often than not, though, they bring the discussion back to the more familiar ground of the challenges of social media, managing communities and engagement.

As journalism follows the mechanisms of the institutions it is meant to hold to account into the digital space, it is perhaps a chance to think about how data journalism can move beyond simply building capacity within the industry, providing useful case studies. Perhaps it is a way to help journalism re-connect to the minority of those in society who, by choice or by circumstance, are left disconnected.

Thinking about ways to make the data we find and the data journalism we create physical, closes a loop on a process that starts with real people in the real world. It begins to raise important questions about what journalism’s role should be in not just capturing the problems and raising awareness but also creating solutions. In an industry struggling to re-connect, it maybe also starts to address the issue of solving the problem placing journalism back in the community and making it sustainable. Researchers reflecting on the Bespoke project noted that:

“elements of the journalism process put in place to inform the design process have continued to operate in the community and have proven to be more sustainable as an intervention than the designs themselves”

If stage three is to make our own data, perhaps it is time to start thinking about stage four of data journalism and make data real.

 Refs:

Alba, Davey (2013) Sensors: John Keefe and Matt Waite on the current possibilities, Tow Centre for Digital Journalism, 5 June. Available online at http://towcenter.org/blog/sensors-john-keefe-and-matt-waite-on-the-current-possibilities/, accessed on 12 August 2013
Alvarez, Marcelino (2011) 72 Hours from concept to launch: RDTN.org, Uncorked Words, 21 March. Available online at http://uncorkedstudios.com/2011/03/21/72-hours-from-concept-to-launch-rdtn-org/, accessed on 12 August 2013
Ashton, Kevin (2009) That “Internet of Things” thing, RFiD Journal 22 pp 97-114. Available online at http://www.rfidjournal.com/articles/view?4986, accessed on 25 September, 2013
Department of Business Innovation and Skills (2009) Digital Britain: Final Report, Stationery Office
BBC (2008) In pictures: Beijing pollution-watch, BBC News website, 24 August. Available online at http://news.bbc.co.uk/sport1/hi/front_page/6934955.stm, accessed on 12 August 2013
Blum-Ross, Alicia, Mills, John, Egglestone, Paul and Frohlich, David (2013) Community media and design: Insight journalism as a method for innovation, Journal of Media Practice, Vol. 14, No 3, 1 September pp 171-192
Bradshaw, Paul. and Brightwell, Andy. (2012) Crowdsourcing investigative journalism: Help me Investigate: A case study, Siapera, Eugenia and Veglis, Andreas (eds) The Handbook of Global Online Journalism, London: John Wiley & Sons pp 253-271
Ellison, Sarah (2011) The man who spilled the secrets, Vanity Fair, February. Available online at http://www.vanityfair.com/politics/features/2011/02/the-guardian-201102 , accessed on 13 September 2013
Gray, Jonathan, Chambers, Lucy and Bounegru, Liliana (2012) The Data Journalism Handbook. O’Reilly. Free version available online at http://datajournalismhandbook.org/
Howard, Alex (2013) Sensoring the news, O’Reilly Radar, 22 March. Available at http://radar.oreilly.com/2013/03/sensor-journalism-data-journalism.html, accessed on 12 August 2013
Kalin, Sari (2012) Connection central. MIT news magazine, 21 August. Available at http://www.technologyreview.com/article/428739/connection-central/, accessed on 22nd August 2013
Knight, Megan (2013) Data journalism: A preliminary analysis of form and content. A paper delivered to the International Association for Media and Communication Research, 25-29 June, Dublin
Livingstone, Sonia and Lunt, Peter (2013) Ofcom’s plans to promote “participation”, but whose and in what? LSE Media Policy Project, 27 February. Available online at http://blogs.lse.ac.uk/mediapolicyproject/2013/02/27/ofcoms-plans-to-promote-participation-but-whose-and-in-what/, accessed on 23 September 2013
Moradi, Javaun (2011) What do open sensor networks mean for journalism?, Javaun’s Ramblings, 16 December 16. Available online at http://javaunmoradi.com/blog/2011/12/16/what-do-open-sensor-networks-mean-for-journalism/#sthash.yXXlHoa2.dpuf, accessed on 9 August 2013
Oliver, Laura (2010) UK government’s open data plans will benefit local and national journalists, Journalism.co.uk, 1 June. Available online at http://www.journalism.co.uk/news/uk-government-039-s-open-data-plans-will-benefit-local-and-national-journalists/s2/a538929/, accessed on 12 August 2013
Rogers, Simon. (2011) Facts are Sacred: The Power of Data (Guardian shorts), Cambridge, UK: Guardian Books
Safecast History (no date) Safecast.com. Available online at http://blog.safecast.org/history/, accessed on 25 September 2013
Sopher, Christopher (2013) How can we harness data and information for the health of communities?, Knight Foundation, 16 August. Available online at https://www.newschallenge.org/challenge/healthdata/brief.html accessed on 10 September 2013.
Taylor, Nick, Marshall, Justin, Blum-Ross, Alicia., Mills, John, Rogers, Jon, Egglestone, Paul, Frohlich, David M., Wright, Peter, Olivier, Patrick (2012) Viewpoint: Empowering Communities with Situated Voting Devices, Proc. CHI 2012 pp 1361-1370, New York: ACM (don’t understand this reference)
Taylor, Nick, Wright, Peter, Olivier, Patrick and Cheverst, Kieth (2013) Leaving the wild: lessons from community technology handovers. in CHI ’13 (don’t understand this reference)
Waite, Matt. (2013) How sensor journalism can help us create data, improve our storytelling, Poynter.org. 17 April. Available online at http://www.poynter.org/how-tos/digital-strategies/210558/how-sensor-journalism-can-help-us-create-data-improve-our-storytelling/, accessed on 28 August 2013

International students die in groups of five. Or do they?

Update: They knocked back my second request on the grounds of anonymity. The sample was so small that giving me details might risk identifying someone. That seems fair, but if nothing else the very low number means that in the context of my original thinking the numbers are not in context large enough to suggest a broader story. (taking as read that the individual circumstances are sad and may have warranted reporting at the time) 

I always like to test out the stuff that I ask my students to do; don’t make people to do something you wouldn’t try yourself (apart from maybe fitting a gas cooker or disarming a bomb ). So I’ve been collecting data from various places to use in data journalism exercises including FOI requests via Whatdotheyknow.com.

I asked for details of people who had died whilst on student and Tier 4 visas. It was playing out a hunch (just curiosity) I had about a few things, in particular the number of those that would be suicides. I thought it would make interesting data and would be something that might interest students without getting in to the dangerous territory of ‘student stories’

Where possible I would like to know the date, location of their death, gender, age, cause of death and sponsor institution.If you could provide this information in digital form, preferably in a spreadsheet format, that would be very helpful

Here’s the data I got.

Not really what I wanted.  The main reason cited was that apart from the information above, was that they were “only able to report on data that is captured in certain mandatory fields on the Home Office’s Case Information Database (CID).” Most of the information I wanted would be in the  ‘notes’ section of any records which would need to be located manually.

The Home Office is not obliged under section 12 of the Freedom of Information Act 2000 to comply with any information request where the estimated costs involved in supplying the information exceed the £600 cost limit. I regret that we cannot supply you with the information that you have asked for, as to comply with your request would exceed this cost limit.

Fair enough although I was a bit suspicious that some of the information that would seem to be pretty useful, like sponsoring institution,  would not have a field.  But I realised that I didn’t really know what fields were in there. In fact I didn’t really know that the Case Information database was where that stuff would be.

Thanks to an FOI by Helen Murphy, I find out that;

All data held on the Caseworker Information Database will fall within a
minimum data set. The Caseworker Information Database contains:
• Name
• Date of birth
• Nationality
• Arrival details
• Temporary admission address
• Detention details
• Refusal reasons• Diary actions
• Notes
• Removal details
• Photograph

More surprisingly it also reveals that “Currently there are over 75 screens on the Caseworker Information Database (CID)”. 75 screens No wonder they can’t find anything!

7 hour days

Helen’s FOI also helped illuminate  working conditions at the Home Office. In Helen’s FOI

The £600 limit is based on work being carried out at a rate of £25 per hour,  which equates to 24 hours of work per request.

In my response :

This [£600] limit  applies to all central Government Departments and is based on work being carried out by one member of staff at a rate of £25 per hour, which equates to 3½ days work per request.

Taking one as a different way of expressing the other ( a dangerous assumption) would suggest less than 7 hour days at the Home office. Still, that seams fair given the number of screens you’d need to wade through. I’d give up after 2 hours!

Groups of 5

The other thing that struck me about the data was the alarmingly uniform numbers that people die in – 5 at a time. It turns out that the figures are not entirely complete *.  A note on the data says:

Figures rounded to nearest 5 (- = 0, * = 1 or 2) and may not sum to totals shown because of independent rounding.

Why round them to 5? It’s not like half a person died! Update: In the comments Martin Stabe suggests “This could be an anonymisation requirement so that individual cases cannot be identified from aggregate data.”

Limits of being human

I’ve put another request in on the basis of the data I got, assuming that 10 cases would be manageable by someone in 3.5 days although 75 screens worth of content might yet fox my demand, so I may never get what I want this way.

The truth is that, as data, what I got is next to useless – no real context and the numbers aren’t even accurate, – but it reinforced a few things for me:

  • Good FOI’s rely on good planning and some prior knowledge. I’d done a bit or work understanding the whole Tier4/student thing but clearly I needed to do more on understanding who held the data, how and why. Data, in fact journalism, is all about context
  • Good FOI’s rarely stand alone. Often an FOI is an enabler. It opens doors, avenues for further questions. That makes it valuable even when the data might be useless.
  • Visibility helps. Helen’s FOI answered questions I had. Maybe mine won’t but It’s in the mix.
  • Open government doesn’t just rely on data. It relies on the capacity to retrieve and search that data. Government is really good at collecting it and shockingly bad at having it in a form that is usable even to themselves. (but we all knew that didn’t we)

Not new or startling revelations but it never hurts to be reminded of these things from time to time.

 

* for ‘not entirely complete’ read ‘bugger all use’