Natural and Intuitive Interaction and Information Access
So the main challenge is to improve the ease of use of new technologies in computer-mediated environments in order to turn smart surroundings into habitable environments.
Our home, office, mobile and public environments will increasingly be equipped with smart hardware and software devices, including mobile robots, virtual humans, smart furniture and various sorts of digital assistants that provide real-time support of our activities (work, entertainment, leisure, family, social events, etc.). To maximize their benefit for the user, such environments need to allow easy and transparent interactions with the user and between the users, and to be capable of interpreting, predicting and reacting to user’s current and future activities and interests, respectively.
Natural and intuitive interaction requires research into verbal and nonverbal communication issues, modality allocation and integration, and user profiles. Information access addresses the challenges of problem-, context- and user-centric interpretation, filtering, delivery and exchange of information generated in or communicated through the environment. Here, research is required on multimodal data storage and processing (e.g. restoration, enhancement and compression), on content-based multimodal data analysis and indexing, as well as on quality of service (QoS) in relation to content exchange and delivery.
Syntactic, semantic and pragmatic analysis of multimodal input and multimedia data can take place in a static (off-line) or dynamic (online) context. In the former case the emphasis is on providing intelligent access (retrieval) to stored multimedia, where ‘intelligent’ stands mainly for context awareness and personalization. In the latter case we have multimodal interaction with reactive and pro-active smart environments and devices. Here, the multimedia content analysis as well as mixed reality and virtual humans are among the research topics addressed.
Aim and Mission
Our mission is twofold:
▫ |
to support multimodal and unobtrusive interactions with and within smart (desktop, virtual human, robot, furniture, wearable, immersive) environments, and |
▫ |
to enable natural and intuitive access to multimedia information presented or generated in or delivered to the user through anticipatory desktop, ambient, virtual and mixed reality environments. |
Research on interaction focuses on the human portion of the human-computer interaction context. We look beyond the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating behavioural, affective and social signalling. The design of these functions require explorations of what is communicated (linguistic message, non-linguistic conversational signal, emotion, person identification), how the information is communicated (the person’s facial expression, head movement, tone of voice, hand and body gesture), in which context the information is passed on (where the user is, what his current task is, how he/she feels), and which (re)action should be taken to satisfy user needs and requirements.
Regarding the information access, the challenge we aim to meet is to make the best out of the data and information overflow resulting from numerous sensors present in the environment and from the (typically broadband) information channels reaching the user from outside the environment. This translates mainly to the challenge of finding the information that is relevant for the given user in a given situation and delivering it to the user in a way which is as good and transparent as possible. Here, just like in the case of interaction, it is necessary to take into account the context, the user characteristics (cognitive functioning and emotional features) and user preferences in order to be able to adapt the information filtering functions of the environment such that intuitive and natural access to multimedia information becomes possible.
Applications can be found in the areas of ambient intelligence, and (collaborative) virtual environments. It is the ultimate goal to increase the “experience” and “presence” within these (virtual) environments and to develop natural interaction techniques (gestures, facial expressions) and more powerful methods to browse distributed multimedia databases (“picture search”), data mining and visual exploration of information spaces, and to create and animate 3D virtual worlds.
This mission translates into several research challenges in the areas of Multimedia, Natural Interaction, and Virtual Reality and Visualization.
Multimedia
This research track covers the steps that are to be performed on the data within an ambient intelligence environment to enable robust, context-aware sensing, data and information processing, interpretation and access between the environment components and users.
▫ |
Context-aware sensing, multimedia data processing and interpretation: One of the main challenges when designing an ambient intelligence environment is to capture the context and the intention of the users or agents (human or otherwise) that play a role in that environment. For instance, smart cameras equipped with algorithms for multimedia content analysis (MCA) can be employed for surveillance and people monitoring tasks. Such cameras can be programmed to detect specific events or “simply” to estimate the suspiciousness of the situation based on measured audiovisual signals, and react to such events and situations by alarming the users and offering support in the decision making process and for the actions decided upon. Applications stretch from public safety, via people monitoring in elderly homes, to the systems capable of observing and learning the user behaviour for the purpose of anticipating his future actions and adapting the smart environment accordingly. |
▫ |
Personalized content access: A smart environment should have a functionality of enabling the user to easily access and manage the information reaching the environment from the outside world. Examples of devices where such functionality is needed are intelligent personal video recorders capable of filtering the incoming TV broadcast material according to the genres and topics that are of interest for the user, as well as handheld devices (e.g. mobile phones, PDAs) capable of summarizing and presenting to the user the news of the day or the highlights of a soccer match. Rich content of multimedia signals and documents, built through synergies of the information contained in different modalities, calls for new and innovative theories and algorithms of multimedia content indexing. Robust and reliable indexing tools will optimally employ audio, image and video processing, language & speech technology and advanced pattern recognition techniques to model, process, mine, organize, classify and index multimedia data at the semantic level. In particular, to exploit synergies in multimedia data, reliable learning, data fusion, classification and classifier combining techniques are required that can simultaneously handle data from different modalities. Next to content indexing, methods and techniques for content personalization are needed to adapt the representation of multimedia content to the needs of a user. Finally, the distributed intelligent media-rich applications as foreseen in our ambient intelligence scenarios also require ambient databases and data management for an ever varying collection of (multimedia) sensor and data sources that is in reach. |
Natural Interaction
This research track addresses the development of natural and intuitive interfaces for interaction processes within ambient intelligence environments and virtual worlds. New interfacing technology should enable a large diversity of devices (mobile robots, wearables, furniture, handheld devices) to interact in a natural way with each other, with human users, and with sensor-equipped smart environments (smart homes, smart offices, smart public spaces) themselves. The paradigm shift towards natural human-machine interfaces requires exploiting the synergy of interaction through speech, body motion, facial and gesture communication. This can only be accomplished by a technology that will enable seamless, integrative and context-sensitive interpretation of information streams from different modalities. At the back-end side of the interface we need intelligent agents to communicate and find and perform services for the user.
▫ |
Natural multimodal human-machine interfaces. The research question here is how to sense, fuse and usefully represent the users’ verbal and non-verbal interactive cues (speech, hand and body gestures, facial expressions and gaze, physiological signals like clamminess) so that the obtained information can be used to operate (interact with) a computer program or a robot in a natural and intuitive way. An example of the latter is to use the utterance “Put armor” and a finger pointing to a computer-game character to replace a sequence of less efficient and less effective mouse clicks and menu selections. Proactive human-machine interfaces make use of context information such as users’ profiles (age, preferences, disabilities, etc.), users’ tasks (and their hierarchy), and contexts (date, time, medication), in order to induce responsive systems, intelligent applications and interfaces that are intuitive, assistive and conversational in style. |
▫ |
Designing intelligent user interfaces. Within smart environments, products should not only perform their primary functions but should also be able to communicate with the user and other appliances in an intelligent way. Evaluation of adaptive and intelligent interfaces is a difficult issue as standard usability engineering techniques are not applicable. For such user interfaces, the evaluation should take place in a realistic rich and dynamic context of use and focus on a prolonged usage period. Assessing the reciprocal adaptations of user and interface, and addressing the contextual effects requires a new methodology for user experience sampling. |
Virtual reality and Visualisation
This research track addresses techniques for modeling and visualizing virtual environments, either for scientific data visualization, or for simulation and gaming. The task here is to support the user with constraint-based modeling techniques to quickly build dynamic environments that provide the user with a complex and ‘realistic’ environments to make game play an involving activity. Data visualization techniques are developed to filter and compress large data sets from scientific simulations and medical scanning devices to allow visual browsing and 3D interaction with the underlying physical phenomena.
▫ |
Interactive data visualization supports browsing of large data sets and information spaces (such as biomedical data). Such visualizations can be designed to allow visual data mining and to help the user to grasp the meaning of the data, and so to support interactive data clustering and classification. 3D display devices such as head-mounted mounted displays and VR stereo projection devices allow us to interactively explore large scientific datasets. To make interaction meaningful the latency in tracking and display should be low. This requires data reduction and data abstraction techniques such as feature recognition to display the data in a compact and semantically meaningful way. In medical applications data visualization is used for pre-operative planning and diagnosis, and for guiding minimal-invasive surgery. |
▫ |
Modeling virtual worlds Building virtual worlds for gaming and simulation is difficult and labor intensive. Further we want virtual characters to act and move as realistically as possible. Motoric and cognitive behavior of these characters needs to be modeled, but also the idea of believable characters from the field of intelligent agents is employed. Constraint-based techniques are used to build dynamic and adaptive worlds. Constraints enable modeling in a declarative way: not specifying every geometric element separately but allowing a higher order and semantically rich specification which is then interpreted and solved to create a specific instance that satisfies the specified requirements. These techniques are being developed within CAD/CAM to built parameterized product models and in games and simulations to create adaptive virtual worlds for level-of-detail or different levels of game play. |