Beep. Boop. That’s the sound of my commute

In spare moments, I find myself strangely drawn to discount site Wish. On a whim, I forked out a few quid on a go-pro rip-off. It lasted less than a minute before it developed a fault. But it kind-of-works (and I got a full refund!). So I stuck my now free camera on my helmet and filmed my cycle commute.

I was interested how I might ‘see’ the commute in a way that didn’t mean sitting through 20 minutes or so of me huffing and blowing down various streets. I remembered a project by ace designer Brendan Dawes called CinemaRedux.

He created ‘visual fingerprints’ of well known films by taking one frame every second of the film and laying them out in rows of 60; one row for every minute of running time. They are fascinating and give an interesting perspective on the film, especially the use of colour. I thought this would be a nice way to see my commute. 

Brendan Dawes CinemaRedux of Taxi Driver.

A while ago a developer called Ben Sandofsky created an app called Thumberwhich creates them, but it didn’t work on my mac (it was built for leopard). So, having recently dipped my toes into python programming (I’ve been scraping twitter for some research), I thought why not see if I could do it using Python.

A lot of GiantCap development later and I got it to work. The result…

A ‘cinema redux of my ride from work

You can see its no Taxi Driver. But there’s the occasional splash of green in the grey of the road and Manchester sky. As a ‘fingerprint’ of my journey, I think it works well. The final python code that makes them is available on GitHub.

It’s clunky and inefficient. But it works and I was inordinately pleased just by the fact that it doesn’t crash (much). So what could I do next with my new found programming powers?

What does my commute sound like?

In my last job, one of the PhD students. Jack Davenport (he does some really cool stuff btw.), was working on a project called the sound of colour which explored playful ways to make music that broke away from standard interfaces like keyboards etc. One experiment included constructing a large table that users could roll loads of coloured balls across. A camera tracked the balls and converted their position and colour into data to play sounds and loop. I loved the idea. Maybe it was there in the back of my mind when I thought it might be cool to work out what the cinemaredux of my commute sounded like.

Sonification of data

Making data audible is not a new concept. As well as a healthy and diverse electronic music scene, there’s a growing and scarily creative programmers and musicians experimenting with real time coding of music. There’s also loads of interesting stuff around using it to explore research data. It’s even got a bit of a foothold in data journalism. Check out the The Center for Investigative Reporting piece on creating an audio representation of earthquake data by Michael Corey @mikejcorey. There’s code and everything. On that note you should also check out Robert Kosara’s piece Sonification: The Power, The Problems. But I digress.

After some reading around I settled on the following basic idea;

  • analyse each image generated by my cinemaredux script and work out what the dominant colour was in each. But I didn’t want one note per picture, the information was too rich for that. But at the same time I didn’t want to create loads of notes from pixels in the image. I needed to filter the data somehow.
  • convert the RGB values of each colour into a MIDI note. I chose MIDI because it gave me the most flexibility and I had a vague idea of how it worked left over from my distant past in music tech. It’s a essentially a data file with what note to play, when and for how long. No sounds etc. I thought this would be easier — once I had data from the image it would just be a case of converting numbers. It would also give me more room to experiment with what the data ‘sounded’ like later on.
Analysing an image to work out the dominant colours filtered the data to something I could use

Midifying dominant colours

Skipping over a good deal of frustrating cut-and-paste, I finally got a script together that would take each frame of the video and give me a range of the dominant colours or ‘clusters’. Converting those into notes and duration didn’t take too much messing around and, thankfully there are some very easy to use Midi libraries for Python out there!

I ended up with each image generating a kind of arpeggio from a cluster — each colour represents a note that plays for a duration equals to the ‘amount’ of that colour in the image analysis. I could have made them play at the same time for a chord, but I knew that would sound odd and the rise and fall of the notes seemed to suit the idea of motion more.

Here’s the first test output from the script — A random image of my daughter messing with the camera, analysed for four clusters. The resulting midi file was run through Garageband with a random instrument (chosen by my daughter) and looped over few bars. It grows on you! (note: the soundcloud embed is a bit flakey on chrome )

Applying the same analysis to my cinemaredux images was just an exercise in time — more images take more time to analyse. But eventually I got a midi file and this is the result. (note: the soundcloud embed is a bit flakey on chrome )

Like my thumbnail experiment I’m happy with the result because, well, it works. At some point I may do a more technical post* explaining what I did. For now though, if you want to see the code and see if you can get it to work, then head over to github.

Some further work

It would be nice if the code was neater and faster, but it works. Where it falls down is in timing. The duration of the midi file is much longer than the actual journey. That means some experimenting with the ratio of notes, tempo and number of images. But I’m happy with the result so far. I’ve also a few more ideas to try:

  • It would be nice to have a version that was more ‘tuneful’ in the traditional sense. In tutorials I’ve read, like Michael Corey’s earthquake piece mentioned earlier, its common to tune the data by mapping the values to a key e.g moving all the notes so they are in Cmajor scale. That way I guess, I could risk generate a chord for each image without it sounding like I’m constantly crashing my bike.
  • It would also be nice to look break colours up across musical tracks. Low value RGB colours like black and grey could be used to play bass notes and higher value colours on another track to play melody.
  • By using MIDI I’m not limited to playing ‘instruments’. I could, for example, use samples of the environment I cycle through and then ‘trigger’ them using the notes. e.g. red plays middle c which triggers the sound of a car. It’s also possible to use data to filter sounds. So I could use the sound from the head camera itself and use the data to apply filters and other effects over its duration.
  • Finally, it would be nice to create a cinemaredux style image just of the colours selected, like a colour based piano roll or musical score.


You might be reading this and thinking why? You may listen to the ‘music’ and really think ‘MY GOD MAN! WHY?’ But the process of thinking about how data points can be ‘transformed’ was fun and I’m now a lot more confident using Python to structure and manage data

There’s a lot of assumptions and work-arounds in this script. The process of making the content more musical alone means a level of engagement with music theory (and midi) that I’m not really up for right now. The more I dive into some of the areas I’ve skated over in the script, the more I become aware that there’s also similar work out there. But my approach was to see how quickly I could get a half-baked idea into a half-made product.

For now I’d be interested in what you think.

*Essentially when looking for scripts to average out the colour of an image I came across a method called k-means clustering for colour segmentation. That’s what is used to generate the stacked chart of colours. That gave me the idea for the arpeggio approach.

Local votes for hyperlocal #DDJ

There’s a good deal of interest in my feeds in a BBC report Local voting figures shed new light on EU referendum. The work has been a bit of a labour of Hercules by all accounts.  

Since the referendum the BBC has been trying to get the most detailed, localised voting data we could from each of the counting areas. This was a major data collection exercise carried out by my colleague George Greenwood.

This was made more difficult by a number of issues including the fact that: “Electoral returning officers are not covered by the Freedom of Information Act, so releasing the information was up to the discretion of councils.”

But the data is in and the analysis is both thorough and interesting.  I particually like the fact that the data they collected is available as a spreadsheet at the end of the article. There are gaps and there have been some issues with this (but its already being put to good use.) . More and more I’m seeing data stories appear with no  link to the data used or created as a result of the reporting.

Getting local.

In a nice bit of serendipity, Twitter through up a link to a story on Reading (Katesgrove Hill)  based hyperlocal The Whitley Pump. The story, ‘Is east Reading’s MP voting for his constituency?‘, starts with the MP for Reading East, Rob Wilson questioning an accusation that he voted against his constituents in the recent Article 50 vote.  His response was  prove it! saying “Could you provide the evidence on how my constituency voted? My understanding is that no such breakdown is available.” That’s just what Adam Harrington of The Whitley Pump set out to do.

The result is a nice bit of data journalism that draws on a number of sources including council data and draws the conclusion: “There is nothing to support a view that Reading East voted to leave the EU, and available data makes this position implausible.” 

If nothing else, its a great example of how hyperlocal data journalism can work. Unlike the BBC the Pump didn’t need to deliver across the whole country but it did follow a lot of the same methods and fall foul of many of the same issues, not least the lack of data in the first place.

Encouraging data practice at hyperlocal level. 

The BBC’s recent announcement on the next steps for its local democracy reporters scheme include mention of a local Data Journalism Hub. In a blog post officially announcing the scheme, Matthew Barraclough noted:

We hope to get the Shared Data Hub in action very soon. Based in Birmingham, BBC staff will work alongside seconded journalists from industry to produce data-driven content specifically for the local news sector.

It would be great to see that opportunity to work and learn alongside the BBC included hyperlocals like the Whitley Pump.

Image courtesy of The European Parliament on Flickr.

The BBC, Local democracy, hyperlocal and journalism.

I spent the afternoon in Birmingham at the BBC finding out more about their Local Democracy Reporters scheme.  It’s a project I’ve been keeping an eye on for a number of reasons.

The promise of 150 new jobs in journalism, especially ones that are exclusively aimed at covering local government,  is clearly of interest to me as a Journalism lecturer.  It’s more opportunities for students and journalists for one thing.  But the focus on civic reporting also begins to address an area that I think is under-resourced and under-valued (by producers and consumers alike).  The scheme also includes plans for a content hub called the News Bank for material created by the reporters open for anyone to apply to use. This would also includes content from the BBC’s fast developing Regional Data Journalism unit.

The combination of data, hyperlocal and civic content is too good for me to ignore.

What’s in it for hyperlocals?

One of the underpinning reasons for this scheme is to “share the load” of accountability journalism. The role of journalism as holding the powerful to account is one that many feel is being lost,  especially at a local government level. People talk about a democratic deficit and news deserts; towns with no journalistic representation at all.  Many see hyperlocals as an essential part of filling the gap but its notoriously hard to create a sustainable hyperlocal business model.  So it is no surprise that hyperlocal and community media representatives have been following the development of the project with interest.  When the BBC promise a pot of money to improve local democratic reporting who better to benefit from the cash!

So how would the scheme work?

The fine detail of the plan is still being pulled together, but in principle the scheme would be something like this:

The BBC will have create contracts for  Local Democracy Reporters but they won’t manage the reporters. Rather than 150 separate contracts, they have packaged them up into ‘bundles’ containing a number of reporters per geographic patch.  Local news organisations can then bid to take on these contracts on behalf of the BBC. The organisation will be responsible for the reporter both editorially and also from a straight HR point of view (sick leave, appraisals etc. ). The BBC have a number of criteria and requirements for anyone wanting to bid. This includes a proven track record in producing good quality content and the capacity to properly employ and manage a member of staff.

The content created by the reporters as well as any prospects will be made available on a shared News Bank.  So as well as the ‘host’ organisation, other media organisations can use the content created.  There would be no exclusives for host  organisations; when the content drops, it drops for everyone with access to the content hub. So you don’t need to employ a local democracy reporter to get access to the content on the Newsbank. But  you would need apply to the BBC for access. As long as you fulfil their criteria – adherence to basic editorial standards and a track record in producing good quality content – you’re in!

There is a good deal of simplification here on my part. There is a tonne more detail in the plans that were presented today but we were asked not to share too much. Which is fine by me.

But at the event today, I made a few broad notes on some issues and observations.


  • Defining ‘bundles’ – A number of the hyperlocal operators in the room noted that the bundles suggested by the BBC sometimes didn’t make sense when you knew the local geography and political landscape.  Others noted that they seemed to mirror the regional media orgs patches. The BBC noted that the geography of the scheme was, in some part, driven by the location of BBC local offices, who would have a role in overseeing the project. That said the BBC were very open to feedback on the best way to divide up the patches . A positive role for Hyperlocals and it shows the value that the focus on a patch can bring.
  • Scale and partnerships – Many of the hyperlocals in the room felt that the decision to package up reporters by patch and the criteria they set for qualifying organisations effectively shut them out of the process.  They might be able to manage one reporter but not three or four across a large patch. One solution offered was working in partnership with larger, regional media organisations to deliver contracts in an area. e.g. An established media player such as Trinity Mirror or Johnston Press could take on the contract and then work in partnership with a hyperlocal to deliver the content whilst the larger org takes on the HR and management issues.  I think the devil is in the detail but it strikes me as a good compromise. But its fair to say, that idea wasn’t warmly received by many of the hyperlocals in the room. I think the the best way to describe the reason is ‘because trust issues’. Interestingly the idea of collaboration between hyperlocals to create collaborative networks to bid got very little comment of it seems interest.
  • Value to the tax payer – The BBC are clearly caught between a rock and hard place with initiatives like this.  They have money that they want to use to ‘share the load’ but at the same time, would be under huge amounts of scrutiny for what is produced and who they work with.  Accountability is something they take very seriously and the BBC are masters at getting themselves in knots trying to be fair and balanced to everyone.  Often they just can’t win.  The scheme as presented today highlighted some of those tensions.  By ‘outsourcing’ the management of the journalists they deal with the issues of the BBC barging into a sector and skewing the market. But at the same time, the need for accountability means the scheme is run through with ‘checks and balances’ the Beeb would apply to ensure the license fee payers were getting value for money.  Its not quite the hands-off it could be.  It also seems that the ‘value for money’ tests stretches to ensure that the material collected by the reporters is also useful to the BBC and their reporting.  Not quite having your cake and eating it as maybe confusing who you are baking the cake for.

But in the midst of the accountability knots and the predictable cynicsism animosity that underpins the relationship between some hyperlocals and the regional media, I think something really important slipped by thats worth keeping an eye on.

The BBC seal of approval

To get access to the NewsBank organisations will need to submit an application to the BBC. General noises around the criteria suggest these will include caveats on quality content and track in producing news content. Orgs will also need to show  a commitment to the same editorial guidelines for balance and impartiality as the BBC. But details of the process of assesment where sketchy.

But lets look at that another way.  In short the BBC will become a local media accreditation body.  

I don’t know how I feel about that. To be clear, I certainly don’t perceive an suspicious motives. But it still makes me uneasy.

I guess you could read it in the same way as hyperlocal’s being recognised as publishers by Google so they could feature in Google News.   Perhaps, as long as the process was transparent, its not a bad thing that some standards are defined. But then I think the sector doesn’t really have a problem in that area.

I don’t know.  But of all the issues this scheme raises, it feels like the one most likely to generate unintended consequences.

All of that said, its worth watching and supporting. Looking beyond the implementation, which is never going to tick all the boxes,  I do think the scheme when it roles out will mark one of, if not the biggest investments in civic journalism in the UK that isn’t technology driven.  I might go as far as to say its the only journalism first investment in civic innovation that I’ve seen in the UK.

It may not work across the board but you’ve got to admire the idea.


Making Instagram video with Powerpoint

Audio slideshows are something I’ve included in my practical teaching for a little while. The combination of images and well recorded audio is, for me, a compelling form of content and it can be an easy video win for non-broadcast shops.

When I work with the students and journalists exploring the concept, I try and look for free or cheap solutions to the production process. In the past I’ve used everything from Windows Movie Maker to Youtube’s simple editor app to put packages together. But this year when I was putting the workshops together, I wanted to focus on social platforms and go native video on Instagram.

Video on Instagram

It’s not the first time I’ve looked at Instagram video. A few years ago, having seen a presentation about the BBC’s Instafax project (in 2014!), I had a look at cheap and free tools to use to create video for Instagram. But things have moved on — like the BBC’s use of Instagram.

So I started to look at how I might use the combination of accessible tools with a view to doing an update on that post. I found my self thinking about Powerpoint.

Why Powerpoint!

When I talk to students about video graphics, I often point them to presentation apps like Google Slides and Powerpoint as simple ways to create graphic files for their video packages. They have loads of fonts, shapes and editing tools in a format they are familiar with (more of them have made a powerpoint presentation than worked with video titling tool!). The standard widescreen templates are pretty much solid for most video editing packages, and you can export the single slides as images. So I took a quick look at Powerpoint to remind myself of the editing tools. Whilst I was playing around with export tools, I discovered that it had an export to video. So I opened up powerpoint to see how far I could go and about an hour later and some playing around and I had the video below.

I worked through the process on a Windows version of Powerpoint, but the basic steps are pretty much the same for a Mac. If you’re on a MAC then Keynote is also a good alternative which will do all of the stuff you can do with powerpoint but with the added bonus that it will also handle video.

Here’s what I did. (You can download the Powerpoint file and have a look I’m making that available as CCZero)

You can see a video walk-through of parts of the process or scroll down for more details.

The process

  • Open Powerpoint and start with a basic template
  • Click the Design Tab and then select Slide Size > Custom Slide Size(Page Setup on Mac)
  • Set the width and height to an equal size to give us the Square aspect ratio of Instagram. Click OK. Don’t worry about the scaling warning

You can set a custom slide size for Powerpoint which means we can create custom slides that fit with Instagram and other platforms.

You can now play around with the editing tools to place text, images and other elements on each slide.

Animating elements

The tools to add shapes and text are pretty straightforward, but one effect that seems popular is ‘typewriter’ style text, where the words animate onscreen. Luckily thats built in on Powerpoint.

  1. Add a Text box and enter the text. Make sure you have the text box selected not the text
  2. Go to the Animations tab, select the text box and click on Appear.
  3. Open the Animations Pane in the tool bar
  4. In the Animations pane right-click on the text box (it will be named with any text you’ve added) appropriate animation and select Effect Options
  5. In the Animate text select by word. You can speed the text up using the delay setting (Note. You can’t do this with the Mac version).

The typewriter effect is a common one on many social videos. One which powerpoint makes short work of.

For the rest, its worth experimenting with basic transitions and animations before you try anything too complex. Once you start to get separate elements moving around you’ll need to think about text as separate elements — you’ll end up with ‘layers’ of text; but that’s no different from a video editor.

Adding Audio

You can add audio to individual slides or to play as an audio ‘bed’ across all the slides.
You can add audio to individual slides or to play as an audio ‘bed’ across all the slides.

A common feature of Audio Slideshows on Instagram (and other social platforms) is that the text drives the story; the audio is often music or location sound that adds a feel for the story. In this example I used sound that I recorded on the scene but you could use any audio e.g. a music track.

You can also adjust the timing of slides to match the audio or just to give you control over the way slides transition and display.

Transitions and timing give you control over how long and how content appears

Exporting your video

Once you’re happy with your presentation you can create a vide version:

  • Click the File tab
  • Select Export > Create a Video

You have a few choices here. The quality setting allows you to scale the video. Presentation quality exports at 1080×1080; Internet quality 720×720 and Low Quality at 480×480. I went for Internet Quality as it kept the file size down without compromising the quality too much.

You can also set the video to use the timings you set up in each slide or to automatically assign a set time to each slide. Which one you pick will depend on the type of video you want to make.

Exporting to video is one of the default options in powerpoint. PC and Mac will save to MP4

Getting video on Instagram

Instagram has no browser interface for uploading. So once the video is exported, you’ll need to transfer the final file to your mobile device. I didn’t struggle emailing files around but you might want to look at alternatives like WeTransfer or GoogleDrive as a way of moving the files around from desktop to mobile device.

Beyond Instagram

It’s worth noting, even belatedly, that your video doesn’t have to be square. Instagram is happy with standard resolutions of video. You could use a standard 16×9 template and Instagram will be fine. I just wanted to be a bit more ‘native video’. But there is nothing stopping you setting up templates for Twitter video (W10cm X H5.6cm Landscape video) or Snapchat (W8.4 cm X H15cm — Portrait video).


There are limitations to using Powerpoint;

  • You need Powerpoint — It’s an obvious one, but I recognise that not everyone has access Office. That said. It can also be the only thing people do have! It’s a trade off.
  • Its not happy with video — If I embed a video into the presentation, Powerpoint won’t export that as part of the video. According to the help file there are codec issues. I haven’t experimented with windows native video formats which may help but it seems like a bit of a mess. It’s a shame. It will take an MP4 from an iphone and play it well. It will spit out an MP4 but it won’t mix the two! Those of you on a mac, this is the point to move to Keynote. Keynote is quite happy to include video.
  • Effects can get complicated — once you get beyond a few layers of texts then the process of animation can be tricky. In reality its no more or less tricky than layering titles in Premier Pro. The Animation Pane also makes this a little easier by giving you a timeline of sorts.
  • Audio can be a faff — The trick with anything other than background sound is timing. Knowing how long each slide needs to be to track with the audio can add another layer of planning that the timeline interface of an editing package makes more intuitive.
  • It’s all about timing — without a timeline, making sure your video runs to length is a pain. With the limitations of some platforms that could mean some trial and error to get the correct runtime.

But problems aside, once you’ve set up a presentation to work, I could see it easily being used as a template on which to build others. The slideshows are also pretty transferable as media is packaged up in the ppt file.

It’s not an ‘ideal’ solution but it was fun seeing just where you could take the package as an alternative platform for social video.

Don’t forget, you can download the PPT file I used and have a dig around (CCZero). Let me know if you find it useful.

Mapping Drone near misses in Google Earth*

My colleague Andrew Heaton from the Civic Drone Centre set me off on a little adventure with mapping tools when he showed me a spreadsheet of airprox reports involving drones.

In my head an airprox report describes what is often called a ‘near miss’ but more accurately, the UK Airprox board describe it as this…

An Airprox is a situation in which, in the opinion of a pilot or air traffic services personnel, the distance between aircraft as well as their relative positions and speed have been such that the safety of the aircraft involved may have been compromised.

The board produce very detailed reports (all in PDF!) on all events reported to them, not just drones, and they pack that all up in a very detailed spreadsheet each year. You can also get a sheet that has all reports from 200–2016! (h/t Owen Boswarva). If you look at those sheets and you just want drone reports look for ‘UAV’. There is also a very detailed interactive map of UK Airprox locations you can look through.

But given I’m on a bit of a spreadsheet/maps thing at the moment, I thought it would be fun to see if I could get the data from the spreadsheet into Google Earth . Why? Well, why not. But I did think it would be cool to be able to fly through the flight data!

Getting started.

The Airprox spreadsheet

At first glance the data from the Airprox board looks good. The first thing to do is tidy it up a bit. The bottom twenty or so rows are reports that have yet to go to the ‘board’. So the details on location are missing. I’ve just deleted them. Each log also got latitude and longitude data which means mapping should be easy with things like Google Maps. But a look over it shows the default lat and long units are not in the format I’d expected.

This sheet uses a kind of shorthand for Northings and Eastings. These are co-ordinates based on distance from the equator — the N you can see in the Latitude — and distance to the west and east of the Greenwich Meridian line, the W and the E you can see in the Longitude. To get it to work with stuff like Google maps and other off the shelf tools it would be more useful to have it in decimal co-ordinates eg. 51.323 and -2.134.

Converting the lat and long

This turned out to be not that straight forward. Although there are plenty of resources around to convert coordinate systems, the particular notation used here tripped me up a little. A bit of digging around including a very helpful spreadsheet and guide from the Ordnance Survey and some trial and error, sorted me out with a formula I could use in a spreadsheet.

Decimal coordinates = (((secs/60)+mins)+degrees)). 

If the Longitude is W then *-1 eg.(((secs/60)+mins)+degrees))*-1So to convert 5113N 00200W to decimal 

Latitude =((((00/60)+13)/60)+51) = 51.21666667
Longitude =((((00/60)+00)/60)+2)*-1 = -2

Running that formula through the spreadsheet gave me a set of co-ordinates in decimal form. To test it I ran them through Google Maps.

Getting off the ground.

Google maps is great but its a bit flat. Literally. The Airprox data also contain altitude information and that seems like an important part of the data to reflect in any visualization around things that fly!. That’s why Google Earth sprang to mind.

To get data to display in Google Earth you need to create KML files. At their most basic these are pretty simple. You can add a point to a map with a simple text editor and a basic few lines like the one below. Just save it with a KML extension e.g. map.kml

<?xml version="1.0" encoding="UTF-8"?> 
<kml xmlns=""> 
 <name>Here is the treasure</name> 
    -0.1246, 51.5007

Any KML files usually open in Google Earth by default and when it opens it should settle on something a bit like the shot below.

Google Earth jumps to the point defined in the KML file.

Adding some altitude to the point is pretty straight forward. The height, measured in meters is added as a third co-ordinate. You also need to set the altitudeMode of the point “which specifies a distance above the ground level, sea level, or sea floor” for the point

<?xml version="1.0" encoding="UTF-8"?> 
<kml xmlns=""> 
 <name>Here is the treasure</name> 
    -0.1246, 51.5007, 96 

The result looks something like this.

Setting the altitudeMode and setting an altitude co-ordinate gives your point a lift.

But hold your horses! There’s a problem.

The Altitude column in the Airprox sheet is not in Meters. Its in Feet.

When it comes to distances aviation guidance mixes its unit. Take this advice from the Civil Aviation Authority’s DroneCode as an example:

Make sure you can see your drone at all times and don’t fly higher than 400 feet

Always keep your drone away from aircraft, helicopters, airports and airfields

Use your common sense and fly safely; you could be prosecuted if you don’t.

Drones fitted with cameras must not be flown:

within 50 metres of people, vehicles, buildings or structures, over congested areas or large gatherings such as concerts and sports events

On the ground its meters but height is in Feet! So the altitude data in our sheet will need converting. Luckily Google sheets comes to the rescue with a simple formula:


A1 = altitude in feet

Once we’ve sorted that out, we can look at creating a more complete XML file from a spreadsheet with more rows.

Creating a KML file from the spreadsheet

The process of creating a KML file from the Airprox data was threatening to become a mammoth session of cut-and-paste, typing in co-ordinates into a text editor. So anything that can automate the process would be great.

As a quick fix I got the spreadsheet to write the important bits of code using the =concatenate formula.

=CONCATENATE("<Placemark> <name>",A1,"</name><Point> <coordinates>", B1,",",C1,",",D1,"</coordinates <altitudeMode>absolute</altitudeMode> </Point> </Placemark>")

A1 = the text you want to appear as the marker
B1 = the longitude
C1 = the latitude
D1 = the altitude

The spreadsheet can do most of the coding for you using the =concatenate formula to build up the string (click the image to see the spreadsheet)

To finish the KML file, you select all the cells with the KML code in and then paste that into a text file with a standard text that makes up a KML header and footer.

<?xml version="1.0" encoding="UTF-8"?> 
<kml xmlns=""> 

paste the code from the cells here.


Your file will look something like the code below. There’ll be a lot more of it and don’t worry about the formatting.

<?xml version="1.0" encoding="UTF-8"?> 
<kml xmlns=""> 
<Placemark> <name>Drone</name><Point> <coordinates>-2,51.2166667,91.44</coordinates> <altitudeMode>relativeToGround</altitudeMode> </Point> </Placemark><Placemark> <name>Drone</name><Point> <coordinates>-2.0166667,51.2333333,91.44</coordinates> <altitudeMode>relativeToGround</altitudeMode> </Point> </Placemark><Placemark> <name>Unknown</name><Point> <coordinates>-2.6833333,51.55,2133.6</coordinates> <altitudeMode>relativeToGround</altitudeMode> </Point> </Placemark><Placemark> <name>Model Aircraft</name><Point> <coordinates>0.25,52.2,259.08</coordinates> <altitudeMode>relativeToGround</altitudeMode> </Point> </Placemark>

The result of the file above looks something like this.

With a simple file you can add lots of points with quite a bit of detail.

Is it floating?

When we zoom in to a point it can be hard to tell if the marker is off the ground or not especially if we have no reference point like Big Ben! Luckily you can set the KML file to draw a line between the ground and the point to make it clearer. You need to set the <extrude> option by adding it to the point data:

<Placemark> <name>Unknown</name><Point> <coordinates>-2.6833333,51.55,2133.6</coordinates> <altitudeMode>relativeToGround</altitudeMode> <extrude>1</extrude></Point> </Placemark>

The result looks a little like this:

Wrapping up, some conclusions (and an admission)

There is more that we can do here to get our KML file really working for us; getting more data onto the map; maybe a different icon. But for now we have pretty solid mapping of the points and good framework from which to explore how we can tweak the file (and maybe the spreadsheet formula) to get more complex mapping.

Working it out raised some immediate points to ponder:

  • It was an interesting exercise but it started to push the limits of a spreadsheet. Ideally the conversion to KML (and some of the data work) would be better done with a script. But I’m trying to be a bit strict and keep any examples I try as simple as possible for people to have a go.
  • The data from the Airprox board is, erm, problematic. The data is good but it needs a clean and some standard units wouldn’t go a miss. It could also do with some clear licensing or terms of use on the site. I could be breaking all kinds of rules just writing this up.
  • The data doesn’t tell a story yet. There needs to be more data added and it needs to be seen in the context i.e the relationship to flight paths and other information.

And now the admission. I found a pretty immediate solution to this exercise in the shape of a website called Earth Point. It has a load of tools that make this whole process easier including an option to batch convert the odd lat/long notation. It also has a tool that will convert a spreadsheet into a KML file (with loads of options). The snag is that it does cost for a subscription to do batches of stuff. However Bill Clark at EarthPoint does offer free accounts for education and humanitarian use which is very nice of him.

So I used the Earthpoint tools to do a little more tweaking, with some pleasing(to me) results.

You can download the KML file and have a look yourself Let me know what you think and if you have a go.

Thanks to Andrew Heaton for advice and helpful navigation round the quirks of all things drones and aviation. If you have any interest in that area I can really recommend him and the work the CDC do.

*Yes, I’m pretty sure ‘near misses’ isn’t the right word but forgive me a little link bait.

Don’t postmortem journalism. It’s not dead. Fix it.

In the aftermath of the Trump win many in the media are looking inward to understand what went wrong. But is it too soon to write off journalism as a failed project?

In the very short time we’ve had to get used to the idea that Donald Trump will be the 45th President of The United States, the hand wringing for journalism has already started.

‘We did this’.

‘We didn’t see this coming’.

‘We trusted the data and not the people’.

‘We’ve lost touch with proper journalism’

There is no doubt that what we know as the modern media is breaking apart. The strands of a profession that hold it together, that define it, are impossibly stretched by digital fragmentation and an economy that now sells choice over balance. More than any recent events, even post-Brexit here in the UK, the industry seems shaken to its core by its lack of foresight. The simmering existential crisis that dogs journalism now risks becoming full blown, crippling self-doubt for those that find their powerful journalistic tools and practice are ineffectual.

The knives are not just out for a postmortem. Many in journalism are taking the opportunity to cut down some tall poppies. Data journalism is already the main target for the traditional journalists championing a return to ‘proper journalism — with all the self-righteous confidence of Trump supporter mandated by the win to call foul on liberal thinking.

But now is not the time to ‘fix journalism’.

Journalism — this election was not about you. In the next few weeks, we’ll need you to explain what’s happening. God knows what the repercussions of today will be. No one has a clue. That’s your job.

Don’t fill the airwaves with conversations about the role of the media. Don’t cram the pages of your papers with handwringing. This wasn’t a surprise . This was the outcome we couldn’t sell to ourselves.

We know what the lessons are.

Time to learn them by doing .

Why social media isn’t blogging.

I’m teaching first year journalism students at the moment and talking to them about a professional online presence. A phrase that I’ve been using a lot is blogging. The idea of a ‘blog’ and its value to an aspiring journalists is one I’m really comfortable with but I checked myself and wondered just what it might mean to the students.

As part of that, I had a look at Google Trends to see how the term blog was fairing. As I noted on Twitter:

If you read all of this post, the irony that I put this on Twitter before I wrote this post — before I blogged it — will not be lost. As many pointed out in the conversation around the tweet, by putting it on Twitter I was blogging. Maybe its the terminology that’s changed.

But for me there is something more about the idea of blogging; something more about what that term means.

There is a very mechanical element to the idea of a blog. At their heart is a mechanism by which anyone (with little more than the time to google your way through the set-up process) can set up a dedicated publishing platform for their content and share with people — the press tools Jay Rosen talked about. In this context, it’s easy to see how the idea of blogs can be subsumed into contemporary platforms and practice. Twitter and other Social media platforms do the same thing. Don’t they?

Blogger has also become a proper noun (beyond the google platform*). It’s a job title. It must be a proper job because we now differentiate between types of blogger — celebrity bloggers, fashion bloggers (its a kind of differential journalism). And to be frank, the amount of money many of them earn certainly qualifies it as ‘a living’

But, and I realise this is where I make this quite parochial and personal, in the journalism sphere, blogging has always meant more to me than simply the process.

Blogging as critical practice.

As digital disrupts, those in the industry who innovate, explore or just honestly talk about the challenges of the day-to-day, are pushed apart. Connections are lost. So the value of social media to hold together and sustain the communities of practice is immeasurable. But social media is prone to echo chambers and its hard for new voices to break in and disrupt the same old conversations. More fundamentally, social media has no collective memory. The mistakes, learning and context are lost in the stream of news. The echo chamber reverberates to a constant churn of the same questions popping up again and again.

Blogging, for me, was a way of setting that down — the collective wisdom of a community. A way for the community to archive its learning and insights. But more than that it was a way for us to share the working out not just the result — It was and continues to be a way for me to test my thoughts.

It also has been one of the key activities that has driven me to get enough profile that you’re reading this at all. It’s allowed me to build a presence alongside the chatter of social media. Something that underpins my transitory interactions with something more substantial (but maybe no less sensible!). An opportunity that is still there for aspiring journalists to grasp and exploit.

There isn’t the time, space or traction for that level of depth or reflection on social media.

So, as much as blogging may be becoming a bit of a legacy term, I still hold to my thought that “a blog is about the space to say why you think something in a world of people saying what they think in 140 chars or less.”

For me blogging was and still is a critical and thoughtful process.

*Just having to clarify that says something about the collective memory of social media)

Mapping street level crime in an area

A little while ago I was playing around with the API at looking at a way to pull the data into a google spreadsheet (and some of the issues around the way policing areas are constructed)

Yesterday I found myself playing with the API again and looking at quick and easy ways to pull data out based on a particular area.

Before I go any further I’d recommend that if you’re going to do anything with crime data from, you read the About pages for more information on what the data means and where the limitations are. 

Back to the project…

I know that the API can deliver street level crime reports based on a number of criteria including multiple latitude and longitude points that describe a shape.,0.543:52.794,0.238:52.130,0.478&date=2013-01

I wondered how easy it would be to get the points of a custom polygon, like the one below, so I could get more specific data.

So I created a basic polygon using Google MyMaps and set about seeing if I could get the data out.

Making the shape

The easiest way to get at the data used to describe the polygons is by exporting the map as a KML file. In Google My Maps:

  1. In the left panel, click Menu (it looks like three dots on top of each other)
  2. Select Export as KML.
  3. You can choose the layer you want to export, or click Entire map. I just picked the layer with the Polygon on.
  4. Click Export.

Sorting out the lat and long points

The file that is exported is a text file so we can open up the file in any text editor and it will look something like this (I’ve just included the first part) and it’s those co-ordinates that I want to get at.

<?xml version='1.0' encoding='UTF-8'?>
<kml xmlns=''>
  <name>Crime Layer</name>
   <name>Crime area</name>
      <coordinates>-2.7231503,53.7637821,0.0 -2.7239227,53.763021,0.0 -2.720747,53.7586067,0.0 -2.7239227,53.7518067,0.0 -2.7229786,53.7493706,0.0 -2.7213478,53.7495229,0.0 -2.7176571,53.7501319,0.0 -2.715168,53.7485078,0.0 -2.7113915,53.7475942,0.0 -2.7094174,53.7476957,0.0 -2.7033234,53.7507917,0.0 -2.6967144,53.7516544,0.0 -2.6905346,53.7486093,0.0 -2.6857281,53.7488631,0.0 -2.6790333,53.7531769,0.0 -2.6811791,53.7566277,0.0 -2.6800633,53.7606363,0.0 -2.6809216,53.7612959,0.0 -2.6774883,53.7620063,0.0 -2.6780892,53.7630717,0.0 -2.6846123,53.7693626,0.0 -2.6918221,53.7693626,0.0 -2.7057266,53.7690583,0.0 -2.7167988,53.7671305,0.0 -2.7231503,53.7637821,0.0</coordinates>

Sadly the co-ordinates are in the wrong format for;

  1. The lat and long are reversed
  2. The API wants each pair (lat and long that describes a point) separated by a colon (:)

So we are going to need to clean the data up a bit. You could take the data points and use various filters, formulas and other things (regex etc.)There’s plenty of ways we can do this but to be honest with such a small set of points I did it by hand.

The biggest issue is getting each pair on a new line. If you can do that then they should cut and paste into a spreadsheet and you can use the SPLIT command in Google Sheets to break the data down. Once you’ve got the Lat and long in adjacent columns then the CONCATENATE formula will help rebuild things in the right format and then the JOIN formula will shunt them back into one line.

The SPLIT formula can be used to separate lat and long using the comma as the delimiter (the thing you split on) Adding TRUE means it will split on consecutive commas
The CONCATENATE formula can be used to join the Lat and Long back together again in the right order, separated by a comma
Finally the JOIN formula helps shunt them all together on to one line, separated by the colon that wants for the API call. 

Some final cutting and pasting and I ended up with this URL to call the API,-2.7226353:53.763021,-2.7234077:53.7586067,-2.720232:53.7518067,-2.7234077:53.7493706,-2.7224636:53.7495229,-2.7208328:53.7501319,-2.7171421:53.7485078,-2.714653:53.7475942,-2.7108765:53.7476957,-2.7089024:53.7507917,-2.7028084:53.7516544,-2.6961994:53.7486093,-2.6900196:53.7488631,-2.6852131:53.7531769,-2.6785183:53.7566277,-2.6806641:53.7606363,-2.6795483:53.7612959,-2.6804066:53.7620063,-2.6769733:53.7630717,-2.6775742:53.7693626,-2.6840973:53.7693626,-2.6913071:53.7690583,-2.7052116:53.7671305,-2.7162838:53.7637821,-2.7226353

Notice that there is no trailing : and I’ve left the date option off. That will give me any street level crime reports, in the area defined for the last month they have. Plug that URL into a new browser tab and you get a page full of JSON data:

[{"category":"anti-social-behaviour","location_type":"Force","location":{"latitude":"53.764959","street":{"id":863936,"name":"On or near Carrol Street"},"longitude":"-2.690727"},"context":"","outcome_status":null,"persistent_id":"725ed090a9eda01c7b53e2e474005e78077bb6e9521a600d90b8a10383fbd05e","id":50943777,"location_subtype":"","month":"2016-08"},{"category":"anti-social-behaviour","location_type":"Force","location":{"latitude":"53.762666","street":{"id":862106,"name":"On or near Driscoll Street"},"longitude":"-2.690796"},"context":"","outcome_status":null,"persistent_id":"463cc6c50d3d8464a4f05d1e9f9d9e18d2138d0ba4b3d843daba7419660ddbaf","id":50939501,"location_subtype":"","month":"2016-08"},

Pulling the data into a spreadsheet

There are lots of applications and scripts that can read the JSON output from the Police API. But I wanted to go with something that required minimal coding and could output something pretty easily so I pulled the data into a google spreadsheet using the importJSON script. Making the script work is dead easy thanks to Paul Gambill’s guide to How to import JSON data into Google Spreadsheets in less than 5 minutes.

Using the importJSON script we can use the api call to populate a spreadsheet. (you should be able to click the image and go through to the spreadsheet)

Visualizing the data

Now that we have the data as a spreadsheet we could start to do some analysis, filtering etc. But we can get a quick win by using the spreadsheet to drive a map.

I went back to the map I used to create the polygon shape, added a new layer and then imported my crime layer spreadsheet into the map. A bit of crunching later and each crime was mapped as a point.


The API isn’t perfect — the data isn’t as fresh as I would like and the geolocation isn’t always accurate (they do say this to be fair). Google maps also has its quirks especially when you’re dealing with lots of data points. But being able to export to KML is nice feature, not only for pulling out polygon data. If you have Google Earth on your computer you can open the KML file and fly around the crimes in your area!

Exporting your Google Map as KML data means you can pull the data into Google Earth and fly around the crime locations.
It’s clunky and no doubt there are more elegant solutions out there (please tell me if you know of them) but, a bit of messing with the format of the data aside, it worked how I thought it would; a process of ‘well I can do this, so if I can do that it should work’ way of piecing together the tools. As a quick and dirty visualization tool (and an exploration of what API’s can do), I think it works well. 

Let me know if you try it!

Note: The data from is made available under the Open Government Licence. That means you’re free to do pretty much anything with it but you must link back to the source where you can. 


How open is open data journalism?

Simon Rogers published a post last week that asked “What does data journalism look like in 2016?”. For Rogers, the winners of the data journalism awards “give us a great sense of where the industry is right now.”

He’s right, the range and depth of the use of data is reassuring and the points Simon raises are well made and offer much food for thought.

But I did find myself getting snagged on one of his points: Open data is still vital.

The awards had a specific category for Open data:

Open data award. Using freedom of information and/or other levers to make crucial databases open and accessible for re-use and for creating data-based stories.

The language used here sits comfortably next to generally accepted definitions of open data. Here’s the definition of open data from for example:

“Open data and content can be freely used, modified, and shared by anyone for any purpose

The Open Data Handbook definition is helpful in highlighting the sharing element:

Open data is data that can be freely used, re-used and redistributed by anyone — subject only, at most, to the requirement to attribute and sharealike.

The winner of the open data category, LA NACION DATA — OPEN DATA Journalism for change is, as Rogers notes in his post:

“a model of open data journalism and this year won the prize for its approach to opening up public datasets in a country with no FOI laws and a long history of limiting media access to government information.”

It does everything required of it by both the definition and the category description. A well deserved win.

Rogers also cites Excesses Unpunished, by Convoca in Peru which “opened up public data to help its users understand the country’s mining industry better.” The project is a media rich and superbly executed investigation and presentation; it pulls together multiple data sources and offers a deeply informative view making the issue and the information accessible. That’s different from open. And there is the snag.

By the definition of open data (and the category criteria) the Convoca report didn’t fully open up public data. Where is the data that means I can check the work or make my own stories? The data they have created isn’t open and accessible for re-use.

And there is the snag.

If you look at other entries the shortlist in the category, it’s a similar story.

THE EXPRESS TRIBUNE (Pakistan)— a nice piece of data driven investigation into the health issues caused by urban pollution that builds on existing research with solid reporting. Sadly the study by Khyber Teaching Hospital and Peshawar Traffic Police conducted isnt linked. Neither is the Nature report. VERDICT: CLOSED DATA

Trinity Mirror(UK) — a great piece of local journalism with a nice level of interaction. But the data is from a commercial supplier with paid for access to the original data. VERDICT: CLOSED DATA

Modern Investor magazine (UK) — A deep and focussed investigation into local government pension schemes that, for small team, packs a punch. The investigation done in part with data derived from hundreds of FOI requests has created a “unique database”…that isn’t open. VERDICT: CLOSED DATA

LeMonde (France) — A great piece of work, in particular their partnership with journalism students but where is the data? VERDICT: CLOSED DATA

It’s not all bad news though. The IndiaSpend (India) project is a great piece of sensor driven data journalism. I love it. But where is the data that drives the map? The umbrella IndiaSpend project does have a “data room” which shows a plan to make the data open VERDICT: OPEN (SUSPENDED)

For me, the only other shortlisted project on the list besides La Nacion, that makes the grade in terms of open is MWAZNA.(Egypt). Their attempt to ”explain and visualize government budget for everyone” is admirable and works well. Best of all, the data is available to download with clear liscence and in an open format. VERDICT: OPEN

Mwazna  Downloads
MWAZNA’s Budget in’s and out’s interactive links to the data which is clearly open. Exemplary stuff.
All but two of the projects on this list (three if we accept the direction of travel IndiaSpend are taking) actually make their data open. Remember, this is the shortlist not all entries. So these are deemed as open data by the judges.

So what’s the problem.

It’s fair to argue that resources and technology are an issue when it comes to making data open, they are. But Mwazna entered in the small newsroom category and LeMonde are clearly not short of resources in comparison. So you can’t say its size.

Privacy and data protection are also appropriate concerns I’ve heard voiced around opening up newsroom data — especially in a world where protecting sources and responsible use of data are often linked. This is a fair concern as far as it goes but as open data advocates are fond of telling government and other bodies, opening up data doesn’t have to mean all your data. If you have a dataset running a visualizations then that data set shouldn’t have data protection or privacy issues associated with it.

What is open data journalism?

I think the real problem is the use of the word open. As I have noted elsewhere, open is really about where do you put the pipe.

  • open| data journalism — data journalism done in an open way.
  • open data | journalism — journalism done with open data.

Either way, the shortlist reflects, at best, a patchy approach to both views.

There is an all too common confusion by journalists of the use of FOI to get data and open data. Using FOI is not open data. Its using a mechanism of open government to get data. Yes the data you get may well be delivered in an open way it may even be open data. But using FOI to “open up data” to do journalism and then not sharing the data you use is not open data or open journalism.

Open data journalism should be using open data, FOI’s or any other sources to collect data to tell a story and then sharing THAT data with your audience.

Does it matter?

Just to be very clear here. I’m not saying that any of the work here is bad journalism. So perhaps I’m being dogmatic or even a little pedantic about the use of the term open data. When there is clearly such good journalism going on shouldn’t we just get on with it? Well, maybe.

But if the practice of data journalism is to deliver on transparency and openness, then it needs to be part of the process. The data it has needs to be open and, especially when it judges itself, it needs to respect the full extent of what that means rather than simply adopting the phrase in such an uncritical way.

I think if journalism really started to embrace the broader meaning of open data, it would be better off for it.

The Panama Papers & trickle down journalism

I’ve been reading a lot about the Panama Papers.

As a ‘thing’, the Panama Paper’s is an amazing project. It’s pretty much written the textbook on how to run a 21st Century journalism investigation overnight. The networked nature, the secrecy all of those elements, the recognition of a global perspective, have been robustly tested over nearly two years of investigation. It’s massively valuable.

The involvement of the ICIJ has been a really interesting part for me. I’ve been watching the emergence of organisations like ProPublica (and, in some respects Wikileaks) for a while and the role of allied journalistic organisations has been fascinating to see. It goes beyond philanthropy and, to some extent, advocacy. The intermediary role of these organisations is a vital pivot point for pulling together investigations like this.

I’ve also been reading that this is the breakthrough for for data journalism.

If we see data journalism as a process — the mechanics of using data — then the Panama Papers is inarguably proof that modern investigative journalism needs data journalism skills.

But if you believe that data journalism reflects something more — a broad approach to journalism that is ‘new’ or different than the old then its a powerful hook on which to hang the view. I’ve certainly seen enough conversation to suggest that the Panama Papers represent a vindication of data journalism — the resignation of Sigmundur Davíð Gunnlaugsson has been used to invoke Watergate — the head on a spike that data journalism can do what ‘traditional journalism’ can do and bring down presidents.

The impact, especially for what it means for data journalism, has been measured and discussed in a quite rarefied way. It’s exciting for journalism insiders and the sheer scope of the story makes it ‘feel’ important — and yes. It is important.

But as the ‘story’ percolates into the national context it moves beyond the broad shock(or lack of it) the extent to which dictators, war criminals and others break the law to hide their ill gotten gains. In the UK a least, it’s fast become an ideological issue — people aren’t breaking the law but it is it right? — it has becomes political. In the academic sense it remains elite.

What impact it might have or the extent to which it will move further down the ‘accountability’ chain to a regional or local level is yet to be seen. Will we be seeing the impact of the Panama Papers at local council level? Maybe. But I do think there is a risk that the Panama Papers could end up a whole new form of trickle down journalism; the impact and benefits remain in the elite journalism sphere and don’t find their way down the chain*. Perhaps that’s more about the state of the channels for accountability further down the chain — there are less places for this stuff to trickle.

I’d hope the sheer weight and scale of the story would apply enough pressure to shift some of the blockages. Once the raw information starts to flow ( and I hope it will) and we can begin to look for more ‘local’ angles, then we will really see if the lessons learned as well as the story really will have the impact it deserves.

That’s where I also think data journalism as a broad concept rather than just a description of a mechanical process has the best opportunity to show its value. As much as the Panama Papers add to an enviable cannon of big wins for data journalism, there is a chance here to show the lessons can scale down as well as up.

*Just to be clear. I know there has been some criticism of the lack of transparency from organizations like Wikileaks that have been couched in these terms. I think the approach so far to not opening up all the ‘data’ has been sensible and appropriate. That said, I do think it is a bullet they are going to have to bite sooner rather than later.