Multimodal Tagging of Human Motion Using Skeletal Tracking With Kinect
The Graduate School, Stony Brook University: Stony Brook, NY.
Recognizing moves and movements of human body(s) is a challenging problem due to their self-occluding nature and the associated degrees of freedom for each of the numerous body-joints. This work presents a method to tag human actions and interactions by first discovering the human skeleton using depth images acquired by infrared range sensors and then exploiting the resultant skeletal tracking. Instead of estimating the pose of each body part contributing to a set of moves in a decoupled way, we represent a single-person move or a two-person interaction in terms of its skeletal joint positions. So now a single-person move is defined by the spatial and temporal arrangement of his skeletal framework over the episode of the associated move. And for a two-person interactive sequence, an event is defined in terms of both the participating agents' skeletal framework over time. In this work we have experimented with two different modes of tagging human moves and movements. In collaboration with the Music department we tried an innovative way to tag a single person's moves with music. As a participating agent performs a set of movements, musical notes are generated depending upon the velocity, acceleration and change in position of his body parts. We also try to recognize human interactions into a set of well-defined classes. We present the K-10 Interaction Dataset with ten different classes of two-person interactions performed among six different agents and captured using the Kinect for Xbox 360. We construct interaction representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. We further aligned the clips in our dataset using the Canonical Time Warping algorithm that led to an improvement in the interaction classification results.