Interpreting and Understanding Activities of
Expert Operators for Teaching and Education
The objective of ActIPret is to develope a cognitive vision methodology that interprets and records the activities of people handling tools. Focus is on active observation and interpretation of activities, on parsing the sequences into constituent behaviour elements, and on extracting the essential activities and their functional dependence. By providing this functionality ActIPret will enable observation of experts executing intricate tasks such as repairing machines and maintaining plants. The expert activities are interpreted and stored using natural language expressions in an activity plan. The activity plan is an indexed manual in the form of 3D reconstructed scenes, which can be replayed at any time and location to many users using Augmented Reality equipment.
Long Term Objective
The long term goal is to devise a system that is able to
teach and train many users with activities of expert operators. While
experts can demonstrate their knowledge to a small group of students and
on limited occasions, the proposed system interprets and understands the
experts activities and enables the repetitive and user-driven reproduction
of the task. Using demonstration alone, the system can store task
knowledge in a 3D reconstructed teaching and maintenance manual.
Figure 1 exemplifies the envisioned uses of the ActIPret developments. During recording, the expertís activities are observed and an activity plan with the reconstructed scenes is obtained. During replay, the trainee/user searches for the activities using a conceptual language. The user is then able to choose between two options: (1) she/he replays the sequence from arbitrary view points and depending on the training level (which requires only AR/VR equipment) or (2) she/he uses the ActIPret system in form of a personal teach assistant: the activities executed are compared with the activities recorded and improvements or corrections are suggested by the personal teach assistant, which results in a superior training effect compared to repetition without feedback.
Figure 1: Using the ActIPret system to record and retrieve activities
ActIPret is an initial step targeted to improve teaching of persons/trainees in such intricate tasks as open surgery, repairing machines and maintaining plants. The system enables learning by observation and the indexing of specific activities temporarily uncoupled from time and place. In the future, teaching can be done with inexpensive equipment (PC, Head Mounted Display) and use complete ActIPret like systems with trainee supervision and expert documentation capabilities.
Training material: the sequences represent real world examples for teaching trainees at
schools/colleges/universities (practical experiences) and employee
training at companies
Documentation: the teaching material can be indexed based on
activities and context to enable long term documentation and user-friendly
the system acts as a long term memory for maintenance of machines and
plants over extended periods of time
Quality Control: immediate feedback to assist the person during training to obtain
correctness of work (personal teach assistant)
Description of the Work
project is organised into eight interlaced technical work packages to
build the cognitive vision
framework and its purposive and reactive processing components. In the
first year the framework and its constituent parts are designed and a
first prototype is implemented.
The approach involves associating attentional pragmatic interpretation with specific phases of tasks and context to zoom in on the relevant objects and activities. The four components of visual processing are all task and context-driven and report visual evidence with confidence measures. These components are the extraction of cues and features, the detection of context-dependent relationships between cues/features, the recognition of the objects handled taking into account potential occlusion and the recognition of activities, and the synthesis of behaviours and tasks that bias the context at the other components. These levels of visual interpretation are interlaced with the attentive and investigative behaviours that provide the feedback to purposively focus processing. Robust interpretation results will be achieved with methods to actively seek good viewpoints and to obtain disambiguating information for detection, recognition and synthesis. Robustness is also enhanced using context-dependent information integration between the components.
Year 1: Prototype
Recognition of single activities and objects with occlusion
handling; Conceptual language defining activities.
Year 2: Interpretation of one-handed activities with objects: placing a CD in a player; Qualitative description of spatial relations between objects and activities; Conceptual activity description for activity plans.
Year 3: Interpretation of two-handed activity
sequence s with objects: changing
a wheel of a car
or another industrial task; Temporal relations between objects and
activities; Activity plan synthesis and replay.
ACIN - Institute of Automation and Control (former INFA) at the Vienna University of Technology, A
CMP - Center for Machine Perception at the Czech Technical University, CZ
COGS - School of Cognitive and Computing Sciences at the University of Sussex, GB
FORTH - Foundation for Research and Technology - Hellas, Computer Vision and Robotics Laboratory at the Institute of Computer Science, GR
PROFACTOR - Produktionsforschungs GmbH, A
|Contact||Markus Vincze, ACIN
Tel. : + 43 1 5041446 / 11