I’m right behind the sentiment expressed in the headline.
I’m fascinated by the tensions caused by the use of open data – or perhaps more specifically the rhetoric of its use. I often find myself questioning the claims of the ‘usefulness’ of open data, especially when they are linked to social and community outcomes. I share Eyal’s view that whilst there may be some big claims, “there is not yet a larger body of work describing how open data has brought about systemic, long-term change to societies around the world.”
Some might argue (me included) that its just too early to make judgements. As idealistic and iconoclastic as the promises may be at times, I do think it is just a matter of time before we begin to see tangible and consistently replicable social benefit from the use of open data.
But the the key challenge is not the destination or how long it takes to get there. It’s how we do it.
In the IJNet piece Eyal makes a distinction between simply freeing the data and its effective use, especially by average citizens. He makes a strong case for the role of “infomediaries” :
These groups (data wranglers, academics, data-proficient civil society organizations, etc.) turn data into actionable information, which can then be used to lobby for tangible change.
I’m very drawn to that idea and it reflects the way the open data ecosystem is developing and needs to develop. But I do think there’s an underlying conflation in the article that hides a fundamental problem in the assumption that infomediaries are effective bridges – It assumes that open data and open government data are the same thing.
It’s an important distinction for me. The kind of activities and infomediaries the article highlights are driven in the most part by a fundamental connection to open government (and its data). There is a strong underpinning focus on civic innovation in this reading of the use and value of open government data. I’d argue that Open Data is driven more by a strong underpinning of economic innovation – from which social and civic innovation might be seen as as value created from the use of services they provide.
There is a gap between those who hold the data and use it make decisions and those that are affected by those decisions. I don’t think that open data infomediaries always make that gap smaller, they simply take up some of the space. Some do reach across the gap more effectively than others – good data journalism for example. But others, through an economically driven service model, simply create another access point for data.
From an open data ecosystem point of view this is great, especially if you take a market view. It makes for vibrant open data economy and a sustainable sector. From the point of view of the citizen, the end user, the gap is still there. They are either left waiting for other infomediaries to bring that data and its value closer or required to skill-up enough to set out across the gap themselves.
There is a lot of the article that I agree with but I’d support the points made with a parallel view and suggest that as well as data literate citizens as Eyal describes them, open data infomediaries need to be “citizen literate”:
A citizen literate data infomediary isn’t one that just knows how to use data – its one that understands how citizens can effectively use data to be part of a decision making process.
During the 2008 summer Olympics, the Beijing Air Track project took a team of photographers from Associated Press and used them to smuggle hand-held pollution sensors in to Beijing. Using their press access to the Olympic venues, they gathered pollution readings to test the Chinese government’s data that a series of extreme emergency measures put in place in the run-up to the games had improved the cities notoriously poor air quality. They were not the only organisation to use sensors in this way. The BBC’s Beijing office also used a hand-held sensor to test air pollution gathering data that appeared in a number of reports during the games.
“prime example of how sensors, data journalism, and old-fashioned, on-the-ground reporting can be combined to shine a new level of accountability on official reports”.
In contrast to the Chinese data, the level of transparency displayed in the way the data was collected vividly illustrates how sensors can play a part in reinforcing data journalism role in the process of accountability.
Testing the context, provenance and ownership – where our data comes from and why – is a fundamental part of the data journalism process. If we are not critical of the data we use (and those that provide it), perhaps becoming over-reliant on data press releases , we can risk undermining our credibility with data-churnalism or, worse still, data-porn! . As data journalism practice evolves, whilst the basic critical skills will remain fundamental, it would seem logical to explore ways that we reduce our dependency on other sources all together. The Beijing project, with its use of sensors, offers a compelling solution. As Javaun Moradi, product manager for NPR digital, succinctly put it:
“If stage 1 of data journalism was ‘find and scrape data.’, then stage 2 was ‘ask government agencies to release data’ in easy to use formats. Stage 3 is going to be ‘make your own data’”
The three stages that Moradi identifies are not mutually exclusive. Many data journalism projects already include an element of gathering new data often done using traditional forms of crowdsourcing; questionnaires or polls. As much as involving the audience has its benefits, it is notoriously unpredictable and time-consuming. But as individuals we already make a huge amount of data. That isn’t just data about us collected by others through a swipe of a loyalty card or by submitting a tax return online. It’s also data we collect about ourselves and the world around us.
An increasing number of us strap sensors to ourselves that track our health and exercise and the “internet of things” is creating a growing source of data from the buildings and objects around us. The sensors used by the AP team were specialist air pollution sensors that cost in excess of $400 – an expensive way for cash-strapped newsrooms to counter dodgy data. Since 2008 however, the price has dropped and the growing availability of cheap computing devices such as Raspberry Pi and Arduino and the collaborative and open source ethic of the hacker and maker communities, have lowered the barriers to entry. Now sensors, and the crowd they attract, are a serious option for developing data driven reporting.
Hunting for (real) bugs with data
In 2013, New York braced itself for an invasion. Every 17 years a giant swarm of cicadas descend on the East Coast. The problem is that exactly when in the year the insects will appear is less predictable. The best indicator of the emergence of the mega-swarm (as many as a billion cicadas in a square mile) seems to be when the temperature eight inches below the ground reaches 64 degrees (18C). So when John Keefe, WNYC’s senior editor for data news and journalism technology, met with news teams to look at ways to cover the story, he thought of the tinkering he had done with Arduino’s and Raspberry Pi’s . He thought of sensors.
Keefe could not find a source for the data that offered any level of local detail across the whole of New York. He took the problem of how to collect the data to a local hackathon, organised by the stations popular science show Radiolab, who helped create a “recipe” for an affordable, easy to make temperature sensor which listeners could build and send results back to a website where they would map the information
Whilst sensors play an enabling role in both examples, underpinning both the Beijing AirTrack and Cicada projects is the idea of collaboration. The Beijing project was originally developed by a team from the Spatial Information Lab at Columbia University. Combining the access of the media with the academic process and expertise of the lab gave the project a much bigger reach and authority. It’s a form of institutional collaboration that echoes in a small way in more recent projects such as The Guardian’s 2012’s Reading the riots. The Cicada project, on the other hand, offers an insight into a kind of community-driven collaboration that reflects the broader trend of online networks and the dynamic way groups form.
Safecast and the Fukushima nuclear crisis
On 9 March 2011, Joichi Ito was in Cambridge Massachusetts. He had travelled from Japan for an interview to become head of MIT’s prestigious Media Lab. The same day a massive underwater earthquake off the coast of Japan caused a devastating tsunami and triggered a meltdown at the Fukushima Dai-ichi nuclear plant, starting the worst nuclear crisis since Chernobyl in 1986. Ito, like many others, turned to the web and social media to find out if family and friends were safe and gather as much information as he could about the risk from radiation
At the same time as Ito was searching for news about his family, US web developer Marcelino Alvarez was in Portland scouring the web for information about the possible impact of the radiation on the US’s west coast. He decided to channel his “paranoia” and within 72 hours his company had created RDTN.org, a website aggregating and mapping information about the level of radiation .
For Alvarez and Ito the hunt for information soon developed into an effort to source geiger counters to send to Japan. Within a week of the disaster, the two had been introduced and RDTN.org became part of project that would become Safecast.org. As demand outstripped supply, their efforts to buy geiger counters quickly transformed into a community driven effort to design and build cheap, accurate sensors that could deployed quickly to gather up to date information.
SIDENOTE: It will be interesting to see how the experiences of Beijing and Safecast could come together in the coverage of the 2020 Olympics in Japan
Solving problems: Useful data and Purposed conversations
Examples such as WNYC’s cicada project show how a strong base of community engagement can help enable data-driven projects. But the Safecast network was not planned, it grew
“from purposed conversations among friends to full time organization gradually over a period of time”
There was no news conference to decide the when and the how it would respond or attempt to target contributors. It was a complex, self-selecting, mix of different motivations and passions that coalesced into a coherent response to solve a problem. It’s a level of responsiveness and scale of coverage that news organisations would struggle to match on their own. In that context, Moradi believes that journalism has a different role to play:
Whether they know it or not, they do need an objective third party to validate their work and give it authenticity. News organisations are uniquely positioned to serve as ethical overseers, moderators between antagonistic parties, or facilitators of open public dialogue
Taking a position as a “bridge” between those with data and resources and “the public who desperately want to understand the data and access it but need help” is a new reading of what many would recognise as a traditional part of journalism’s process and identity. The alignment of data journalism with the core principles of accountability and the purpose of investigative journalism, in particular, makes for a near perfect meeting point for the dynamic mix of like-minded hacks, academics and hackers, motivated not just by transparency and accountability. It also taps into a desire not just to highlight issues but begin to put in place solutions to problems. This mix of ideologies, as the WikiLeaks story shows , can be explosive but the output has proved invaluable in helping (re)establish the role of journalism in the digital space. Whether it is a catalyst to bring groups together, engage and amplify the work of others or a way, as Moradi puts it, to “advance the cause of journalism by means other than reporting” , sensor journalism seems to be an effective gateway to exploring these new opportunities
The digital divide
The rapid growth of data journalism has played a part in directing attention, and large sums of money, to projects that take abstract concepts like open government and “make them tangible, relevant and useful to real live humans in our communities”. It’s no surprise, then, that many of them take advantage of sensors and their associated communities to help build their resources. Innovative uses of smart phones, co-opting the internet of things or using crowd funded sensor project like the Air quality egg. But a majority of the successful data projects funded by organisations such as the Knight Foundation, have outputs that are almost exclusively digital; apps or data dashboards. As much as they rely on the physical to gather data, the results remain resolutely trapped in the digital space.
“We are at a tipping point in relation to the on-line world. It is moving from conferring advantage on those who are in it to conferring active disadvantage on those who are without”
The solution to this digital divide is to focus on getting those who are not online connected. As positive as this is, it’s a predictably technological deterministic solution to the problem that critics say conflates digital inclusion with social inclusion . For journalism, and data journalism in particular, it raises an interesting challenge to claims of “combating information asymmetry” and increasing the data literacy of their readers on a mass scale .
Insight journalism: Journalism as data
In the same year as Digital Britain report appeared, the Bespoke project dived into the digital divide by exploring ways to create real objects that could act as interfaces to the online world. The project took residents from the Callon and Fishwick areas in Preston, Lancashire, recognised as some of the most deprived areas in the UK, and trained them as community journalists who contributed to a “hyperlocal” newspaper that was distributed round the estate. The paper also served as a way of collecting “data” for designers who developed digitally connected objects aimed at solving problems identified by the journalists. A process the team dubbed insight journalism .
One example, the Wayfinder, was a digital display and a moving arrow which users could text to point to events happening in the local area.
Another, Viewpoint was a kiosk, placed in local shops that allowed users to vote on questions from other residents, the council and other interested parties. The questioner had to agree that they would act on the responses they got, a promise that was scrutinised by the journalists.
The idea was developed during the 2012 Unbox festival in India, when a group of designers and journalists applied the model of insight journalism to the issue of sexual harassment on the streets of New Delhi. The solution, built on reports and information gathered by journalists, was to build a device that would sit on top of one of the many telegraph poles that clutter the streets attracting thousands of birds. The designers created a bird table fitted with a bell. When a woman felt threatened or was subjected to unwanted attention she could use Twitter to “tweet” the nearest bird table and a bell would ring. The ringing bell would scatter any roosting birds giving a visible sign of a problem in the area. The solution was as poetic as it was practical, highlighting not just the impact of the physical but the power of journalism as data to help solve a problem.
Stage four: Make data real
Despite its successes sensor journalism is still a developing area and it is not yet clear if it will see any growth beyond the environmental issues that drive many of the examples presented here. Like data journalism, much of the discussion around the field focuses on the new opportunities it presents. These often intersect with equally nascent but seductive ideas such as drone journalism. More often than not, though, they bring the discussion back to the more familiar ground of the challenges of social media, managing communities and engagement.
As journalism follows the mechanisms of the institutions it is meant to hold to account into the digital space, it is perhaps a chance to think about how data journalism can move beyond simply building capacity within the industry, providing useful case studies. Perhaps it is a way to help journalism re-connect to the minority of those in society who, by choice or by circumstance, are left disconnected.
Thinking about ways to make the data we find and the data journalism we create physical, closes a loop on a process that starts with real people in the real world. It begins to raise important questions about what journalism’s role should be in not just capturing the problems and raising awareness but also creating solutions. In an industry struggling to re-connect, it maybe also starts to address the issue of solving the problem placing journalism back in the community and making it sustainable. Researchers reflecting on the Bespoke project noted that:
“elements of the journalism process put in place to inform the design process have continued to operate in the community and have proven to be more sustainable as an intervention than the designs themselves”
If stage three is to make our own data, perhaps it is time to start thinking about stage four of data journalism and make data real.
Alba, Davey (2013) Sensors: John Keefe and Matt Waite on the current possibilities, Tow Centre for Digital Journalism, 5 June. Available online at http://towcenter.org/blog/sensors-john-keefe-and-matt-waite-on-the-current-possibilities/, accessed on 12 August 2013 Alvarez, Marcelino (2011) 72 Hours from concept to launch: RDTN.org, Uncorked Words, 21 March. Available online at http://uncorkedstudios.com/2011/03/21/72-hours-from-concept-to-launch-rdtn-org/, accessed on 12 August 2013 Ashton, Kevin (2009) That “Internet of Things” thing, RFiD Journal 22 pp 97-114. Available online at http://www.rfidjournal.com/articles/view?4986, accessed on 25 September, 2013 Department of Business Innovation and Skills (2009) Digital Britain: Final Report, Stationery Office BBC (2008) In pictures: Beijing pollution-watch, BBC News website, 24 August. Available online at http://news.bbc.co.uk/sport1/hi/front_page/6934955.stm, accessed on 12 August 2013 Blum-Ross, Alicia, Mills, John, Egglestone, Paul and Frohlich, David (2013) Community media and design: Insight journalism as a method for innovation, Journal of Media Practice, Vol. 14, No 3, 1 September pp 171-192 Bradshaw, Paul. and Brightwell, Andy. (2012) Crowdsourcing investigative journalism: Help me Investigate: A case study, Siapera, Eugenia and Veglis, Andreas (eds) The Handbook of Global Online Journalism, London: John Wiley & Sons pp 253-271 Ellison, Sarah (2011) The man who spilled the secrets, Vanity Fair, February. Available online at http://www.vanityfair.com/politics/features/2011/02/the-guardian-201102 , accessed on 13 September 2013 Gray, Jonathan, Chambers, Lucy and Bounegru, Liliana (2012) The Data Journalism Handbook. O’Reilly. Free version available online at http://datajournalismhandbook.org/ Howard, Alex (2013) Sensoring the news, O’Reilly Radar, 22 March. Available at http://radar.oreilly.com/2013/03/sensor-journalism-data-journalism.html, accessed on 12 August 2013 Kalin, Sari (2012) Connection central. MIT news magazine, 21 August. Available at http://www.technologyreview.com/article/428739/connection-central/, accessed on 22nd August 2013 Knight, Megan (2013) Data journalism: A preliminary analysis of form and content. A paper delivered to the International Association for Media and Communication Research, 25-29 June, Dublin Livingstone, Sonia and Lunt, Peter (2013) Ofcom’s plans to promote “participation”, but whose and in what? LSE Media Policy Project, 27 February. Available online at http://blogs.lse.ac.uk/mediapolicyproject/2013/02/27/ofcoms-plans-to-promote-participation-but-whose-and-in-what/, accessed on 23 September 2013 Moradi, Javaun (2011) What do open sensor networks mean for journalism?, Javaun’s Ramblings, 16 December 16. Available online at http://javaunmoradi.com/blog/2011/12/16/what-do-open-sensor-networks-mean-for-journalism/#sthash.yXXlHoa2.dpuf, accessed on 9 August 2013 Oliver, Laura (2010) UK government’s open data plans will benefit local and national journalists, Journalism.co.uk, 1 June. Available online at http://www.journalism.co.uk/news/uk-government-039-s-open-data-plans-will-benefit-local-and-national-journalists/s2/a538929/, accessed on 12 August 2013 Rogers, Simon. (2011) Facts are Sacred: The Power of Data (Guardian shorts), Cambridge, UK: Guardian Books Safecast History (no date) Safecast.com. Available online at http://blog.safecast.org/history/, accessed on 25 September 2013 Sopher, Christopher (2013) How can we harness data and information for the health of communities?, Knight Foundation, 16 August. Available online at https://www.newschallenge.org/challenge/healthdata/brief.html accessed on 10 September 2013. Taylor, Nick, Marshall, Justin, Blum-Ross, Alicia., Mills, John, Rogers, Jon, Egglestone, Paul, Frohlich, David M., Wright, Peter, Olivier, Patrick (2012) Viewpoint: Empowering Communities with Situated Voting Devices, Proc. CHI 2012 pp 1361-1370, New York: ACM (don’t understand this reference) Taylor, Nick, Wright, Peter, Olivier, Patrick and Cheverst, Kieth (2013) Leaving the wild: lessons from community technology handovers. in CHI ’13 (don’t understand this reference) Waite, Matt. (2013) How sensor journalism can help us create data, improve our storytelling, Poynter.org. 17 April. Available online at http://www.poynter.org/how-tos/digital-strategies/210558/how-sensor-journalism-can-help-us-create-data-improve-our-storytelling/, accessed on 28 August 2013
I noted that @patrickolszo mentioned that post on twitter today and I realised it was a bit long in the tooth and, of course, things have moved on considerably. So here is how I’d do that now using googledocs and google fusion tables.
I’ll use the data from the original map which came from publicservice.co.uk which is not current but at least it keeps the link across the posts!
Importing the data
In the previous example I noted that I did a fair bit of manipulation on the information splitting the text up to make columns I could use.
If you look at the sheet you can see it is actually a bit of a mess in places but it’s raw data. Truth be told I went through excel to do some column splitting and then combining to get the postcode out of the address.
Click Google Spreadsheets from the Import new table options
Find your saved version of the post office spreadsheet and click Select
A version of the spreadsheet will be loaded. At this point you can make changes to the column headers – I changed Address three to complete address for example. The last screen prompts you to add contextual information. It’s really worth doing this.
When its loaded in the data is not that different from the spreadsheet. Fusion tables splits it across a number of pages, 100 rows at a time but that’s all that’s really obvious. I’m not going to go in to the functionality etc. of fusion tables here though. I’ll just do the basics to get us to the map.
Mapping the data
The first thing we need to do is tell fusion tables what data we are going to map each row.
Select Edit > Modify columns
Select the Post Code column and change the Type to Location
Select File > Geocode and select Post Code from the drop down
Fusion tables will then generate location data based on the post codes in the table. Much easier than the Yahoo pipes solution! You can also try geocoding more generic information. You could, for example, set the Address two column type to location as well and geocode that. But if you have more than one post office in the same town that might cause problems.
Generating the map
No you have the location information
Select Visualize > Map
Fusion tables will take the data and put it on the map. You’ll need to zoom in and move around to see the points in detail. Clicking on a point will open a standard speech bubble with more data. But overall it’s looking pretty good. Well, almost. One of our post offices seems to have moved from Sevenoakes to Northern France!
Checking the data, it seems that the post code is TN13 IHZ when it should be TN13 1HZ. It isn’t the only one but I’ll just correct that postcode for now :
Click Visualize > Table to switch back to table view
Click the Address two header
Select Sort Desc
Find Sevenoaks and change the appropriate Post Code Entry
Click Visualize > Map to go to the map view
Editing the map
Changing the data on the map is simply a case of updating the table. But you can also edit the content of the bubble using the Configure info window link and even the marker by using the Configure styles link.
One nice trick here is to add an extra column to your spreadsheet and add a custom icon to use on your map. In the example below I added a column to my spreadsheet called logo. I added ‘post_office’ to each entry. When I import that in to Fusion tables I can use the Configure styles link to choose the logo column as the icon.
Sharing the map.
When you’re happy with the map you can share it with others in a similar way to other google documents.
Click on the share button in the top-right of the page
Change the settings to suit
Once that is done you’re good to go. Here’s an embedded version of the map.
Note: The embed works using an iframe so if you’re using WordPress.com, sorry, you’re out of luck when it comes to adding the map to a post.
The process of mapping is obviously a lot simpler with something like Fusion tables and I think that underlines just how far we’ve moved. That’s not just in terms of the underlying technology which, when you think about it, is pretty amazing. It also shows just how far and how quickly data, geocoded or otherwise, has become mainstream.
Anyway, it’s a simple little example and there is a tonne more that you can do with fusion tables but I hope it’s a useful insight.
If you’re not familiar with scraperwiki it’s “all the tools you need for Screen Scraping, Data Mining & visualisation”.
These guys are working really hard at convincing Journos that data is their friend by staging a steady stream of events bringing together journos and programmers together to see what happens.
So I landed at NWVM’s offices to what seems like a mountain of laptops, fried food, coke and biscuits to be one of the judges of their latest hacks and hackers day in Manchester (#hhhmcr). I was expecting some interesting stuff. I wasn’t dissapointed.
We had to pick three prizes from the six of so projects started that day and here’s what we (Tom Dobson, Julian Tait and me) ended up with.
The three winners, in reverse order:
Quarternote: A website that would ‘scrape’ myspace for band information. The idea was that you could put a location and style of music in to the system and it would compile a line-up of bands.
A great idea (although more hacker than hack) and if I was a dragon I would consider investing. These guys also won the Scraperwiki ‘cup’ award for actually being brave enough to have a go at scraping data from Myspace. Apparently myspace content has less structure than custard! The collective gasps from the geeks in the room when they said that was what they wanted to do underlined that.
Second was Preston’s summer of spend. Local councils are supposed to make details of any invoice over 500 pounds available, and many have. But many don’t make the data very useable. Preston City council is no exception. PDF’s!
With a little help from Scraperwiki the data was scraped, tidied and put in a spreadsheet and then organised. It through up some fun stuff – 1000 pounds to The Bikini Beach Band! And some really interesting areas for exploration – like a single payment of over 80,000 to one person (why?) – and I’m sure we’ll see more from this as the data gets a good running through. A really good example of how a journo and a hacker can work together.
The winner was one of number of projects that took the tweets from the GMP 24hr tweet experiment; what one group titled ‘Genetically modified police’ tweeting :). Enrico Zini and Yuwei Lin built a searchable GMP24 tweet database (and a great write up of the process) of the tweets which allowed searching by location, keyword, all kinds of things. It was a great use of the data and the working prototype was impressive given the time they had.
Other projects included mapping deprivation in manchester and a legal website that if it comes off will really be one to watch. All brilliant stuff.
Hacks and hackers we need you
Give the increasing amount of raw data that organisations are pumping out journalists will find themselves vital in making sure that they stay accountable. But I said in an earlier post that good journalists don’t need to know how to do everything, they just need to know who to ask.
The day proved to me and, I think to lots of people there, that asking a hacker to help sort data out is really worth it.
I’m sure there will be more blogs etc about the day appearing over the next few days.