Abstract:
Hands are the central means by which humans interact with their surroundings. Understanding human hands help human behavior analysis and facilitate other visual analysis tasks such as action and gesture recognition. Recently, there has been a surge of interest in understanding first-person visual data, and hands are the dominant interaction entities in such activities. Also, there is an explosion of interest in developing computer vision methods for augmented and virtual reality. To deliver an authentic augmented and virtual reality experience, we need to enable humans to interact with the virtual world and allow virtual avatars to communicate and interact with each other. Since hands are the dominant interaction entities in such cases, a thorough understanding of human hands is essential in developing computer vision methods for augmented and virtual reality.
The first step toward the visual understanding of human hands is to detect hands in images or videos. Localizing hands in the wild is challenging since numerous hands can be present, and there can be tremendous occlusions between hands, objects, and people. We propose a contextual attention method to address these issues to detect hands.
While it is essential to detect hands, this is not sufficient for a more fine-grained semantic understanding of hands. When humans interact with their surroundings, their hands contact objects and other humans around them. Therefore we need to study their contact information to have a more meaningful understanding of hands. To address this, we propose to study hand contact recognition.
To understand how human hands interact with the world, we need to know how hands move across time. In other words, we need to track hands in videos. While there are hand-tracking methods, they are mostly developed for tracking only one or two hands. Tracking more than two hands is challenging in unconstrained conditions. To address this, we study the problem of tracking more than two hands.
While it is important to detect, track and obtain contact states of hands for detailed activity understanding, this is not sufficient. For a scene containing multiple people, we need to know what object is manipulated by whom and which person is performing what activity. In other words, we must detect hands and localize the corresponding person simultaneously. To tackle this, we study the hand-body association in images.
To truly understand human hands, we must study how they interact with their surrounding objects. We identify some future directions in this regard.
Dates
Friday, October 14, 2022 - 10:00am to Friday, October 14, 2022 - 12:00pm
Location
NCS 105, or Zoom - contact events@cs.stonybrook.edu for more information.
Event Description
Event Title
Ph.D. Research Proficiency Presentation: Supreeth Narasimhaswam, 'Understanding Human Hands in Visual Data'