A video is worth a thousand keywords.

Let’s face it – video is really starting to take over the Web.  Why read something when you can watch it?  I mean, it’s much cooler to watch an opposing running back consistently break through the Kansas Chief’s defense and rack up 200 yards that it is to read about it later.  Yes, I am bitter.

Anyway, chances are pretty good that the video is on the Web, but the problem is finding it.  Over 1,000,000 World Cup 2010 videos were uploaded to YouTube alone, so you can imagine that

finding the right replay may be quite a challenge.  Existing video search tools struggle to deal with such a volume of content.  Most search engines sort video content using metadata – the keyword tags manually attached to videos.  The problem is that tags encapsulate one person’s judgement of a video’s content, and a tag-only search system will produce a lot of irrelevant results.  Gareth Morgan recently reported on online video and audio search engine Blinkx.

According to Suranga Chandratillake, chief executive, “For video search to be really effective, you need better ways to understand what is going on in the actual footage.”  As well as metadata, Blinkx uses speech

recognition algorithms to interrogate a video directly.  The transcripts it generates provide more data for the firm’s text-based search engine.  Blinkx’s algorithms attempt to parse a chuck of speech into phonemes- the small sound segments that make up individual words.  The speech recognition tools then attempt to reconstruct a sentence out of the phonemes.  It is by no means a foolproof approach, however, and Blinkx has been working on improving its speech recognition capabilities by building in feedback mechanisms.  For instance, the user-added tags provide context to help decide which of two transcripts is most likely to be correct.  The drawback with this type of phonetic transcription analysis is that it is only suited to video with good quality sound.  Still, it might be possible to use the images themselves as part of the search.  The Defense Advanced Research Projects Agency (darpa) will complete its Video and Image REtrieval and Analysis Tool (Virat) project, which uses computer vision algorithms to analyze surveillance footage for significant events.

A different approach – semantic querying – could be the answer.  It involves teaching a search engine to recognize semantic concepts, such as “grass,” “football,” and “stadium,” using supervised learning techniques.  During a teaching phase, the system is fed with examples of the concept.  Software algorithms define the concept by it color, texture or shape to create models of each one.  According to Marcel Worring, a multimedia analysis research at the University of Amsterdam in the Netherlands, “So with a new video, the model is applied and automatically a measure is given of how likely it is that the concept is present in that video.”  The strength of the semantic querying approach is that it can work at multiple levels, so it can narrow the search more effectively.  To search much larger video libraries, a good strategy might be to use keywords first to whittle down the number of results, then apply semantic querying to improve the quality and relevance of the videos finally presented to the searcher.

# # #

bloomfield knoble creates marketing plans, strategy, creative design, collateral, Power Point presentations, email templates, videos, audio, music videos, television commercials, letterhead, identity, gift cards, SWOT analyses, brochures, letter templates, software applications, web applications, multimedia productions, Flash content, streaming videos, logo designs, widgets, technical consulting.