Abstract: Unsupervised few-shot action recognition is a practical but challenging task, which adapts knowledge learned from unlabeled videos to novel action classes with only limited labeled data.
Abstract: Large contrastive vision-language models (VLMs) have recently shown promise in skeleton-based action recognition. However, given the lack of skeleton frame-text training datasets for VLMs, ...