|
|
Conference Tutorial
|
dr. Ignazio Infantino, Phd |
Detection of human activities and human intentions through cognitive architecture
Abstract
In the wider context of capturing and understanding human behavior, it is important to perceive (detect) signals such as facial expressions, body posture, and movements while being able to identify objects and interactions with other components of the environment. The techniques of computer vision and machine learning methodologies enable the gathering and processing of such data in an increasingly accurate and robust way. If a system captures the temporal extent of these signals, then it can make predictions and create expectations of their evolution. In this sense, we speak of detecting human intentions, and in a simplified manner, they are related to elementary actions of a human agent. Over the last few years, the approach pursued in the field of the Human-Computer Interaction (HCI) has changed, shifting the focus on human-centered design, and the creation of systems of interaction made for humans and based on models of human behavior. The Human centered design, however, requires thorough analysis and correct processing of all that flows into man-machine communication: the linguistic message, non-linguistic signals of conversation, emotions, attitudes, modes by which information are transmitted, i.e. facial expressions, head movements, non-linguistic vocalizations, movements of hands and body posture, and finally must recognize the context in which information is transmitted.
The tutorial/talk will deal with approaches based on cognitive architectures, and the development of software agents with the aim of detecting human movements and perceiving actions and intents and the design of a semantic structure linked to visual data. In particular, an implemented “intentional” vision system will be described, a system that “looks at people” and automatically perceives information relevant to interpret the human behavior, distinguishing between unintentional human movements, movement for manipulating objects, and gestures used for communicating. The use of word "intentional" in this context concerns the purpose of generating a stream of pre-processed data useful for reasoning, recognition, reacting, and interacting when a human and his activity are objects of observation from the artificial system. The raw data coming from multiple sources of images and videos are filtered and processed in order to retain information useful to understand the human will, state and condition.