Work program
|
Unsupervised identification aims at assigning character names to clusters in a completely automatic manner (i.e. using only available information already present in the speech and video). In TV series and movies, character names are usually introduced and reiterated throughout the video. We will detect and use addresseraddressee relationships in both speech (using named entity detection techniques) and video (using mouth movements, viewing direction and focus of attention of faces). This allows to assign names to some clusters, learn discriminative models and assign names to the remaining clusters.
For evaluation, we will extend and further annotate a corpus of three TV series (49 episodes) and one movie series (8 movies), a total of about 50 hours of video. This diverse data covers different filming styles, type of stories and challenges contained in both video and audio. We will evaluate the different steps of this project on this corpus, and also make our annotations public
|