First impressions of a crowd-searching archive presentation project of the BBC – by Virág Bottlik
With the launching of the BBC World Service Prototype in 2012 a collection of more than 50000 pre-recorded BBC radio programmes from the 1960’s have been made accessible online. The collection is available thanks to an innovative way of processing and managing content by combining machine and human resources.
The prototype has been developed as part of the cooperation project, Automatic Broadcast Content Interlinking Project (ABC-IP), between BBC Research & Development and MetaBroadcast. The idea behind the project is to roughly tag the content automatically and then let the audience do the fine tuning.
The content is processed by an audio-recognition software that makes the first attempt to identify the relevant topics (in some cases the links that have already been associated with the programmes). They use linked data to tag the content, whereby all content tags correspond with wikipedia articles, speakers, series-names etc..
In the archive the radio shows are presented with the automatically received tags from the audio files, the file description and some automatically associated images from Ookaboo. The algorithms obviously do not work perfectly so the tags are not always fitting the content. The show on the death of Pope John Paul II is, for instance, linked with the horror movie “Shock Waves” because in the description you can read: “Death of the Pope and the shock waves around the world.” These mismatches are to be corrected by the users.
After signing up with name and e-mail you`re invited to work on the metadata of the content by voting up or down some automatically generated topic tags, name some speakers, work on the description, recommend another picture or add more tags.
When editing the description you can always click the “see history” button. Here you can follow the changes and restore to the original version. If you name an unknown speaker or are editing an existing speaker you don´t see any histories, you can change the names easily. When changing a name of a speaker in one show you change the name in all shows, which have been connected to the same person (ID). Here I made a mistake: wanting to try out the function I just renamed an identity and after I had completed, I started to search for the option to reverse the changes – there was not such a thing. (Sure, I could rewrite the original name, if I remembered… Sorry.) It is also impossible to reverse by recommending a “better image” to illustrate the show.
Adding a tag is not so easy. Starting to type in a word you get recommendations of phrases which are corresponding to Wikipedia-pages. You cannot just type in anything, so if you don´t find the phrase you were looking for and you stick to the tag-idea you have, you have to create a Wikipedia page to be able to correspond with your tag.
Summing it up so far I guess we see an option which is worth thinking about for community media archiving purposes. For those who might be fit in technical aspects here is a short description. After seeing how users can help improve the archive, let´s have a look at how and what for they can use it for.
If you only want to explore the collection you just give in a search term, open an item and start to navigate between shows through thematic links, series, speakers, etc.
The searching options are quite poor. You can not search by author, by running time, by location or by dates, you only search by searching terms. The result of your search is filtered by decades and availability (some shows are unavailable because of technical reasons or copyright considerations). It is not easy to tell how the results are listed. It`s neither chronological, nor alphabetical, it is not happening upon popularity not even upon the content identification number. Not to mention that you don`t have an opportunity to list the matches upon some criteria (author, running time, publication date etc).
It is also not easy to understand how searching with multiple search-terms functions. If you search “Budapest” you receive 52 matches. If you search “Budapest” and “hotel” you receive 56 matches, some of them are tagged with “hotel”, some with “Budapest”. I didn`t find any double match (I admit I didn`t go through all of them though). So maybe there are only 4 of all content pieces tagged with “hotel”? Not at all: for “hotel” you get 357 matches!
Search operators known from google like “” for the exact term or “OR” for a search for either word are not accepted here either.
If you are looking for something definite the chances of finding it, without reading the summaries of hundreds (or in some cases thousands) of items, are slim. This is not very motivating even if you know your contributions are supposed to improve the archive, not only the “correctness” of the metadata but through a feedback circle also in the searching and navigating options. Besides these shortcomings on user-friendliness I missed the comment, or other content evaluating options, which are also basic things on community managed sites for sorting items (by popularity for instance).
Summing it up: BBC World Service Prototype shows a nice example for crowd content managing possibilities, which is without a doubt highly relevant in community media context. However, when thinking about a similar system we have to be critical towards its lack of effective usability and develop a system matching our purposes. We need to provide real, effective access to our contents – and always be aware of what the users are interested in.