COST292 at TRECVID 2007
Warning: your browser seems to be quite old, or at least old enough that it does not support the minimal standard capabilities of today's browsers, which are used extensively on this site. Please upgrade to a browser that supports web standards, specifically one that supports minimal CSS1 capabilities, such as Netscape 6 or higher, IE 5 or higher, or Opera 5 or higher. Your browsing experience will be much better! Thanks.
Having a long term experience in TRECVID activities, COST292 partners have decided to submit results for evaluation in all four system task of TRECVID 2007, namely shot boundary determination, high-level feature extraction, interactive search and rushes exploitation. This action is coordinated by Qianni Zhang from Queen Mary University of London, United Kingdom.
Shot boundary detection task is to identify the shot boundaries with their location and type (cut or gradual) in the given video clip. The submission is led by Marzia Corvaglia from university of Brescia, Italy, joined by contribution from METU, Delft University of Technology, University of London Queen Mary and LABRI.
High-level features system task is evaluating the effectiveness of detection methods for high-level semantic concepts such as "Indoor/Outdoor", "People", "Speech". The group leader is Dr. Selim Aksoy from Bilkent University. Additional contributors from UBI, QMUL, Delft, University of Zilina, NTUA, TID, University of Novi Sad.
The central system task of interactive search is led by Qianni Zhang from QMUL. Additional contributors by ITI, University of Belgrade and Zilina.
Finally, the rushes task is coordinated by Delft University of Technology contributions from LABRI, VicomTech and University of Bristol.
After the initial meeting of the COST292 technical committee in Delft, Netherlands on 1-2 March 2007, it is decided which partners will coordinate and contribute to which system task as well as that the first submission workshop of the TRECVID activity will be hosted by LABRI in Bordeaux, France on 19 March 2007. Follow-up activity is planned on 7-8 June from Santorini, Greece to conduct the search experiments and compile the search task results.
The core milestones of the TRECVID 2007 submission are as follows:
- 1 Mar development data available for download
- 9 Mar sample ground truth for ~20 of 50 development videos available
- 15 Mar summarization guidelines complete
- 1 Apr test data available for download
- 11 May system output submitted to NIST for judging
- 1 Jun evaluation results distributed to participants
- 22 Jun ACM papers
- 29 Jun acceptance notification
- 11 Jul camera-ready papers due via ACM process
- 28 Sep video summmarization workshop at ACM Multimedia '07, Augsburg, Germany.
Shot boundary detection task
- Four Shot Boundary (SB) detection techniques were used:
- Twin comparison method based on statistical modelling where transitions are detected using the error signals and an adaptive threshold (University of Brescia);
- Spatiotemporal block based analysis for the extraction of low level events, using 3D pixel block overlapping (TU Delft);
- Different operations performed on the DC image of the luminance channel: DC image is preferred due to its robustness against small changes that do not correspond to shot boundaries as well as its contribution to computation time (METU);
- Spectral clustering algorithm Normalized Cuts on frames inside the sliding window, the cluster bounds are used for detecting shot boundaries (QMUL).
- Results of individual SB are combined two by two
High-level feature extraction task
- Four feature extraction approaches were used:
- The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilizes neural networks for feature detection (NTUA);
- features extracted: desert, vegetation, mountain, road, sky, fire-explosion, snow, office, outdoor, face, person.
- The second system uses a Bayesian classifier trained with a ``bag of sub-regions'’ (Bilkent University);
- features extracted: sports, weather, court, office, meeting, studio, outdoor, building, desert, mountain,sky, snow, urban, waterscape, crowd, face, person, police/security, military, prisoner, animal, screen, US flag, airplane, car, bus, truck, boat/ship, walking/running, people marching, explosion/fire, natural disaster, maps, charts.
- The third system uses a multi-modal classifier based on SVMs and several descriptors (UBI);
- features extracted: road, sky, face, person, screen, airplane, boat/ship, explosion/fire.
- The fourth system uses two image classifiers based on ant colony optimization and particle swarm optimization respectively (QMUL);
- features extracted: weather, outdoor, building, vegetation, sky, US flag, airplane, boat/ship, explosion/fire, maps.
Interactive search task
The system submitted to the search task is an interactive retrieval application combining retrieval functionalities in various modalities (i.e. visual, audio, textual) with a user interface supporting automatic and interactive search over all queries submitted.
The basic retrieval modules integrated in the developed search application are:
- Visual similarity using MPEG-7 colour and texture descriptors (ITI);
- Audio filtering applied to the shots retrieved; six classes are defined: applause, laugh, screaming, music, loud noise and speech (University of Zilina);
- Textual information processing exploiting audio features extracted off-line with Automatic Speech Recognition and Machine Translation (ITI);
- Relevance feedback training the system to adapt its behaviour to users’ preferences by involving a human in the retrieval process, two approaches were considered:
- Discriminative method SVM: it combines several MPEG7 or non-MPEG7 descriptors as a cue for earning and classification (QMUL);
- Query shifting in combination with the PFRL method and the non-linear modelling capability of the RBF (University of Belgrade).
- A video summarization and a browsing system comprising two different interest curve algorithms and three features were used:
- Interesting moment detectors:
1- Arousal value determined by affective modelling and based on audio-visual and editing features (TU Delft);
2- Spectral video clustering algorithm Normalized Cuts on frames (QMUL).
- Features (LaBRI and VICOMTech):
1- Colorbar and Backscreen to detect unwanted frames;
2- Face detection to detect human presence;
3- Camera motion to highlight significant events.