Home: Office Activity Awareness
  System: Activity Detection  
Sensors The system consists of two sensors in the environment (a microphone and a web camera) and a sensor in the computer (a computer activity logger). The environment sensor data are integrated using a data collection tool that Bilge Mutlu and Ian Li developed. The processed data from the sensors are stored in log files.

The video input for the system is the Creative Ultra NX web camera. The camera features a wide-angle lens making it sufficient to contain a large amount of area of the office. Compared to other web cams, the quality of its images are very good making it ideal as a sensor.
Figure1. The Creative Ultra NX web camera

The input from the web camera is processed by the system to compute a motion level value between 0 and 1 inclusively. Motion value is computed as the normalized difference between the pixels of the current image and a previous image.
Figure 2. Processing of the images to produce a value for the motion level.

A mini microphone is used to record the sound level in the environment. Sound levels within intervals of a quarter of a second are averaged to give a final value for sound level. Since sound level is the only feature recorded, the participants' identities and topics of conversation are kept private.
Figure 3. The mini microphone

The computer activity logger uses the AmIBusy toolkit that James Fogarty developed. The toolkit provides classes to record computer input events such as keystrokes, mouse clicks, and mouse movements. To protect users' privacy, the program does not record what keys are pressed, only whether the keyboard is being used. The logger records every one-tenth of a second whether there was computer input.
Figure 4. The computer activity logger
 
Data Collection The data from the environment sensors are integrated together using a data collection tool that Bilge Mutlu and Ian Li developed. The tool takes data from the sensors, processes them, and stores them in a log file.

The collection tool has a user interface that allows a data collector to log the actual state of the environment in real time. Since data was being collected from different spaces at the same time for several hours, it is not possible to have an observer to record the actual activities that are happening in the space. To resolve this, the system takes snapshots of the environment every 30 seconds. The snapshots are later labeled for activity.
Figure 5. The user interface for the data collection tool that Bilge Mutlu and Ian Li developed. The set of buttons is not used.
 
Deployment The system was deployed to four office environments: the offices of two professors and two Ph.D. students. The participants agreed to be recorded throughout the day from when they arrive in their office to when they leave.

The sensors are installed in the participant's office space. The environment sensors (web camera and microphone) are mounted on the ceiling in the middle of the room to capture as much of the space. The computer activity logger is installed in the participant's computer.
Figure 6. The sensors deployed in two participant offices.
 
Results I ran various machine learning algorithms (Bayes Net, Naive Bayes, Logistic Regression, and Bagging with reduced-error pruning trees) on the collected features from the environment sensors. I analyzed the sets of features for each participant and day.
Recall
Participant Accuracy Classifier Outside Sitting Sit&Talk Walking
Prof 1 Day 1 88.04% Bagging 0.944 0.877 0.948 0
Prof 1 Day 2 92.86% Bayes Net 0.657 0.97 0.986 0
Prof 2 Day 1 90.03% Bagging 0.829 0.931 0.923 0.381
Prof 2 Day 2 87.62% Bagging 0.886 0.898 no data 0.227
Student 1 90.85% Bayes Net 0.723 0.972 0.867 0
Student 2 93.08% Bagging 0.88 0.976 0 0
Table 1. The best detection accuracies for each set of data (participant and day)

The table shows that activities {outside, sitting, sitting and talking, walking} can be accurately detected using features from simple sensors (web camera and microphone). The detection accuracies range from 88% to 93%. The best learning algorithms for detecting activities were Bayes Net and Bagging with reduced-error pruning trees.

Notice that the recall of the activities, outside, sitting, and sitting and talking, are very good. The amount of data collected for these activities were sufficient to train the classifiers. On the other hand, the detection of walking is very poor. Only a few algorithms were able to detect walking, albeit those algorithms poorly detected walking. The poor detection can be attributed to the small amount of training data labeled walking. Another reason for the poor detection is that half minutes sets of data were labeled. A walking activity only takes a few seconds and thus the limitations of labeling the data hid the features of walking.

Notice that the accuracy of detection is fairly consistent across the different participants. It might be possible that it is sufficient to train the classifier for a day and to accurately detect activity for two days or a week. It also might be possible that a classifier from a person can be generalized across several people with no or minor modifications.