This paper studies referring video object segmentation (RVOS) by boosting videolevel visual-linguistic alignment. Recent approaches model the RVOS task as a sequence prediction problem and perform ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results