Contextual Negatives

One of the problems encountered when trying to find a single sign in a block of sign is that the words which co-occur tended to crop up in the results. For example, if you're looking for the sign for rain, how do you avoid finding the sign for cloud which is a commonly co-occurring term. Inspired by Mensink and Verbeek[7] we dope our negative data with examples likely to contain the noise from our positive data. This is done by looking at words which occur in the vicinity of the target word and then performing a search for examples where the contextually similar word occurs but not the target word. When these subtitle sections have been identified they are used to index video sections which should be added to the negative data set.