Collaged from works by Andy Mabbett, BBC and Tim O’Riordan ©2014/CC BY-SA 3.0
As a nascent Web Scientist the irony of a Dalek ‘guarding’ the entrance to this weekends’ Speakerthon event at BBC Broadcasting House in London, was not lost. Daleks represent a dystopian view of the ‘cyborg’, the twisted collaboration between organic and inorganic, a man-machine mashup that has willingly or unwillingly sacrificed human empathy for improved performance. The contrast between this popular icon and the people working in the hall beyond was considerable.
Speakerthon was organised by BBC R&D and Wikimedia UK as a collaborative web-enhancement event. The aim of the day was to interrogate the BBC Radio 4’s permanently available archive (e.g. The Woman’s Hour Collection), select clips of notable people speaking and add them to Wikipedia. Wikimedia UK’s Andy Mabbett thought up the idea and has spent the past 2 to 3 years convincing BBC decision makers of the efficacy of opening up their archive. In addition to applying open licences to BBC content, providing a rich layer of information to Wikipedia entries, and adding good quality linked data to the Web, the visibility of the archive is greatly enhanced, and tagged clips will be used to teach applications to automatically identify voices in the archive (e.g. The World Service Radio Archive Project), thereby making BBC researchers jobs a great deal easier.
The day started with a briefing session. We were shown how to use the BBC ‘Snippets‘ software (sadly only made available to us on the day), and what type of clips to listen out for. Finding 20 to 40 second clips of individuals talking, preferably about themselves or their field of work, without interruption or any background music was frustrated on some programmes by over enthusiastic interviewers who would insist on butting in, whereas others (like Desert Island Discs) proved to be a goldmine of useful clips.
Once a clip was identified and selected, ‘Snippets’ created a URL, which we manually added to a Google Docs spreadsheet along with the persons name and gender, Wikipedia URL, and programme archive URL. This was then picked up by the BBC editorial team, who checked ‘compliance’ (i.e. the suitability of the clip and any outstanding copyright issues), trimmed and edited the clip (using Audacity a free audio editor), encoded it to the open source .flac format, and uploaded it to Wikimedia. At the time of writing about 100 clips have been uploaded out of the 300 created on the day.
I added eleven clips to the Google Docs spreadsheet, three of which have been uploaded to Wikimedia. I was beaten to finding a clip of one of my Web Science course leaders, and Head of Faculty, Dame Wendy Hall, although I think my selection where she talks about the Semantic Web, is more appropriate than the clip currently on Wikipedia. So far I’ve embedded voice clips and metadata for Owen Hatherley and Claire Skinner, and three of the clips: Guglielmo Marconi, his second wife Maria Cristina Bezzi-Scali and John Scott-Taggert (the first person to receive a radio message from a ship in distress) are awaiting confirmation of their copyright status.
With the news often dominated by stories portraying the Web as an ‘evil cyborg’ out to dominate our lives, infringe our privacy and ‘exterminate’ our liberties, it was a real joy to take part in this life-affirming collaborative cyberspace project. We came together to share our love of archives and an appreciation of technology as a force for good, to start something that has the potential to be considerably bigger than the sum of its parts.
Speakerthon: Sharing Voice Samples – Marieke Guy, Open Education Working Group
While capturing audio from the BBC’s web archive and uploading it to Wikipedia (or anywhere else) is relatively straight-forward, doing so without the express permission of the BBC infringes their copyright.