COST292 at TRECVID 2006
Warning: your browser seems to be quite old, or at least old enough that it does not support the minimal standard capabilities of today's browsers, which are used extensively on this site. Please upgrade to a browser that supports web standards, specifically one that supports minimal CSS1 capabilities, such as Netscape 6 or higher, IE 5 or higher, or Opera 5 or higher. Your browsing experience will be much better! Thanks.
Introduction
Having a long term experience in TRECVID activities, COST292 partners have decided to submit results for evaluation in all four system task of TRECVID 2006, namely shot boundary determination, high-level feature extraction, interactive search and rushes exploitation. This action is coordinated by Dr Janko Calic from University of Bristol, United Kingdom.
Shot boundary detection task is to identify the shot boundaries with their location and type (cut or gradual) in the given video clip. The submission is led by Jenny Benois-Pineau from LABRI Bordeaux, France, joined by contribution from Delft University of Technology, Netherlands.
High-level features system task is evaluating the effectiveness of detection methods for high-level semantic concepts such as "Indoor/Outdoor", "People", "Speech". The group leader is Dr Selim Aksoy from Bilkent University, Ankara, Turkey. Additional contributors are Dublin City University, Ireland and National Technical University from Athens, Greece.
The central system task of interactive search is led by Qianni Zhang from Queen Mary, University of London, United Kingdom. A retrieval platform is maintained by ITI, Thessaloniki, Greece, while additional contributors include Slovakia with audio classification and Belgrade with relevance feedback.
Finally, the rushes task is coordinated by Janko Calic from University of Bristol, United Kingdom, with contributions from Dublin City University, Ireland, from LABRI, Bordeaux, France and from Delft University of Technology, Netherlands.
After the initial meeting of the COST292 technical committee in San Sebastian, Spain on 1-2 March 2006, it is decided which partners will coordinate and contribute to which system task as well as that the first submission workshop of the TRECVID activity will be hosted by LABRI in Bordeaux, France on 8-9 June 2006. Follow-up activity is planned for early September to conduct the search experiments and compile the search task results.
The core milestones of the TRECVID 2006 submission are as follows:
- 15 August 2006 – Shot Boundary Task due
- 21 August 2006 – Feature task due
- 15 September 2006 – Search task due
- 23 October 2006 – Conference papers due
- 7 November 2006 – Workshop registration closes
- 13-14 November 2006 – NIST conference
Results
Shot boundary detection task
- 10 runs were submitted.
- Two shot boundary detection techniques were used:
- Uses spatiotemporal block based analysis for the extraction of low level events.
- Many methods that use frames or 2D blocks in the frames as the main processing units
- This technique exploits overlapping 3D pixel blocks in the video data.
- Based on the “Rough Indexing Paradigm”
- it works on compressed video only.
- Method works separately on I-frames and P-frames.
- Detection on P-frames is based on the temporal difference of intra-coded macroblocks and the variation of global motion parameters.
- Uses spatiotemporal block based analysis for the extraction of low level events.
High-level feature extraction task
- Three feature extraction approaches were used:
- The first approach uses keyframes of each video sequence. The following 7 features were extracted and used in this approach: desert, vegetation, mountain, road, sky, fire-explosion and snow.
- The second system uses low-level feature extraction to assign labels to sub-regions for each keyframe. A Bayesian Classifier is then trained (using two separate models) based on this “bag of sub-regions”. We have trained these models using for six classes: snow, vegetation, waterscape, sky, mountain and outdoor.
- The third method focussed on the extraction of textual information from the digital media. This allows e.g. speaker information, location, date/time, score, results etc. to be queried more thoroughly.
Interactive search task
- 6 runs were submitted:
- 3 using ITI framework + QMUL RF;
- 3 using QMUL stand-along module + QMUL RF.
- In this task, lead by QMUL, an integrated search system was developed for an interactive retrieval application:
- Combines basic retrieval functionalities in various modalities (i.e. visual, audio, textual)
- Uses a user interface supporting the submission of queries using any combination of the available retrieval tools.
- The basic retrieval modules integrated in the developed search application are below:
- Visual similarity search – using MPEG-7 XM and its extensions.
- Audio filtering – based on the video shot audio information.
- Textual information processing – exploiting audio features extracted for each shot, indicating the presence of noise, speech and music in the shot.
- Relevance feedback – train the system to adapt its behaviour to users’ preferences by involving a human in the retrieval process.
Rushes task
- A video summarisation system targeting intuitive browsing of large video archives was used.
- Camera work classification module detects and annotates regions with appropriate camera motion
- ‘Arousal value’ determined by affective modelling
- Arousal value assigned to extracted keyframes
- Value used to optimally lay out final video summary on single display or page.