Results - Weakly Supervised

Below are the response graphs when applying the iterative mining solution to subtitle extracted sections of videos. The bars across the top of each graph show the positions of the subtitles and the ground truth of the sign. Figure 9 shows the most basic case without the use of contextual negatives. Figure 10 shows the improvement gained by adding contextual negatives and Figure 11 shows the further improvement of adding prior knowledge about sign language. In this case that Army and Soldier in English are both represented by the same manual sign and can therefore be combined for this purpose. Finally figure 12 shows that this is not a classifier that only works on seen data as the responses across an entire 30 minute news broadcast are shown.

Figure 9 - The responses for the sign Soldier using random negatives - 11 positives.

Figure 10 - The responses for the sign Soldier using contextual negatives - 11 positives with a cleaner response.

Figure 11 - The responses for the sign Soldier when the prior knowledge that the sign for soldier is the same as the sign for Army and these two words are combined. Again using contextual negatives - 17 positives.

Figure 12 - Showing the response across an entire 30 minute TV broadcast - note the peaks occur almost exclusively around the ground truth regions.

More details are available in the paper on my publications page.