Real-Time Hand Gesture Telerobotic System Using the Fuzzy C-Means Clustering Algorithm
JUAN WACHS*, URI KARTOUN, HELMAN STERN, YAEL EDAN
Department of Industrial Engineering and Management
Ben-Gurion University of the Negev
P.O.Box 653, Be’er-Sheeva 84105, ISRAEL
Fax: +972-8-6472958; Tel: +972-8-6461434
E-mail: (juan, kartoun, helman, yael)@bgu.ac.il
* Corresponding Author: Juan Wachs (juan@bgumail.bgu.ac.il)
This paper describes a teleoperation system in which an articulated robot performs a block pushing task based on hand gesture commands sent thorugh the Internet. A Fuzzy C-Means clustering method is used to classify hand postures as “gestures”. The fuzzy recognition system was tested using 20 trials each of a 12 gesture vocabulary. Results revealed a recognition rate (the ratio of unclassified gestures to classified gestures) of 0.96, and a recognition accuracy (the percent of the classified gestures recognized correctly) of 100%. No gestures were recognized incorrectly. Performance times to carry out the pushing task showed rapid learning, reaching standard times within 4 to 6 trials by an inexperienced operator.
KEYWORDS: gesture recognition, telerobotics, fuzzy c-means, hand gesture.
Hand gestures are one but
a few of the methods used in telerobotic control [1]. This type of communication
provides an expressive, natural and intuitive way for humans to control robotic
systems to perform specific tasks. One benefit [2] of such a system is that
it is a natural way to send geometrical information to the robot such as ; left,
right, up and down hand gestures. Gestures may represent a single command, a
sequence of commands, a single word, or a phrase. Such systems should be accurate
enough to provide the right classification of hand gestures in a reasonable
time [3].
To control a robot movement,
the user evokes a gesture from a gesture vocabulary. The user lays his/her hand
over a video imager, and a raw image is acquired. An interface screen allows
the user to view the captured gesture. The gesture is classified using a recognition
module based on the FCM algorithm [5] and is sent to the robot for execution.
The components of the system consist of a five degree of freedom articulated
A255 robot manufactured by CRS Robotics, a dual 200 MHz Intel® PentiumTM processor PC system with a Matrox Meteor frame grabber, two 3Com “HomeConnect”
PC digital USB (Universal Serial Bus) cameras, and a Panasonic video imager.
A set of recognized gestures is sent through the TCP/IP communication protocol
to a distant robot PC server (Fig. 1). The server is connected to the robot
controller and two USB cameras (web cameras) continually capture the robot scene.
Both side and front views of the robot scene are sent to the client using the
FTP protocol, and then presented in the user interface. To enable the user to
control the robot over the Internet, TCP/IP and FTP client-server applications
were developed and installed on each computer.
Fig 1. User-Robot Communication Architecture The hand gesture recognition system flow diagram is shown in Fig. 2. Upon presentation
of the robotic scene in the user’s interface, a gesture G is evoked and selected
from the gesture vocabulary { G1 ,G2, ….., G12}. A vision
system converts the captured image of the gesture into a feature string which
is subsequently recognized and sent to the robot PC server. After the robot
executes the command, camera views of the robot environment are transmitted
back to the interface. Fig 2. System Flow Diagram
A vocabulary of 12 static gesture poses was designed for robot
control tasks (Fig. 3). The first six gestures of the vocabulary control the
robot arm using world coordinates. The forward and back hand gestures
control the X-axis, the right and left hand gestures control the
Y-axis, and the up and down hand gestures control the Z-axis of
the articulated robot arm. The Roll Right and Roll Left hand gestures
rotate the wrist joint, and the Open Grip and Close Grip gestures
control the robot gripper. The Stop hand gesture stops any action the
robot performs. The Home hand gesture resets and calibrates all robot
joints in the home position.
Fig 3. Visual Gesture Recognition Language Fuzzy C-Means Clustering algorithm (FCM) is an easily understood
and fast time computational algorithm described mathematically in [5]. Given
a set of n data patterns, where and where, the satisfies:
In the proposed methodology, the FCM algorithm is provided with a training
set of candidate gestures. Each gesture is represented by a feature vector extracted
from the gesture image. The set of feature vectors are clustered for subsequent
use in a recognition system. Once the clusters have been created, they are labeled.
(i.e, assigned a linguistic description which is the class or gesture associated
with each cluster). This must be done manually, hence the term unsupervised
clustering. The cluster center, Preprocessing of the captured gesture image includes a number
of image processing operations such as thresholding followed by morphological
erosion and dilation type filters until a black and white hand object is segmented
from the background. This is followed by constructing a bounding box around
the segmented hand. A feature vector of the image with 13 parameters is created.
The first feature is the aspect ratio of the bounding box. The last 12 features
are block grays mean values calculated from a 3 by 4 block partition of the
image. The block gray mean values represent the average brightness of each block
in the image. Fig. 4 illustrates a typical user gesture (a), its block mean
grayscale values (b), and the resultant feature vector, respectively.
Feature
Vector = [57 176 52 2 2 68 249 171 16 3 13 253 188]
Fig 4. Illustration of a Feature Vector
The
training stage involves running the Fuzzy C-Means algorithm for a set of exemplar
hand gestures. During the training process, several hand gesture images are
inserted into a database which includes all possible gestures. Variations are
incorporated by slightly varying the hand configuration for each gesture. Variations
in lighting are handled by preprocessing the image to reduce its effect. At
least 25 samples for each of the 12 hand gestures in the language are taken
to construct the training set. Every image gets an identification number, plus
a feature vector as described in section 3.3. This information is also inserted
into the database.
During the FCM clustering process,
for each training image, a membership vector is built and stored in the database.
The membership vector computed from (2) uses the actual training image feature
vector. A cost function is built to estimate how optimal the system is. After
classification of all the images in the training set, equation (1), was used
to empirically find the best value of parameter m.
Gestures performed by a user are classified using the highest membership value.
In our case, if X k’ is the feature vector of the current
hand gesture image , its distance to each of the cluster centers Fig 5. Successful Recognition
An experiment was designed to test the recognition system. Twenty trials
of each gesture were presented to the gesture recognition system. Two measures
of performance, based on a distinction between classification and recognition,
were used in the evaluation. A gesture k' whose entire set of membership values,
{u i k’ ; " i=1,…,c}, is below the given
threshold value of t is said to be unclassified. A gesture k', with least one membership value above the threshold value t,
is said to be recognized as belonging to classification i' , if u i’ k’ = Max {u i k’ ; " i=1,…,c}. We can now define the two performance measures as:
(a) Recognition Rate - The ratio of unclassified gestures
to classified gestures.
(b) Recognition Accuracy - The percent of the classified
gestures recognized correctly.
Results indicate a recognition rate of 0.96, and a recognition
accuracy of 100%. Unclassified gestures could be attributed to the recognition
system or lack of training of the user to sufficiently provide the correct hand
configuration for the intended gesture. All other classified gestures were recognized
correctly. Gestures 5 (Right) and 6 (Left) were the most common unclassified
gestures, while gestures 4 (Back), 9 (Open Grip) and 11 (Home) were rarely unclassified.
This information is useful in the redesign of the gesture vocabulary.
A case study demonstration was performed in which an operator using hand
gestures controls the remote robot to perform a task in real-time. The task
was to push a yellow wooden cube, located on a top of a pile into a container
adjacent to it. (Fig. 6)
Fig 6. A255 Robot, Plastic Cup Structure, and Yellow Wooden
Box
An inexperienced operator performed ten
identical experiments and the performance times for each were recorded. The
system is fast, accurate, and due to a simple hand gesture language developed,
the learning curve of task completion time was reduced quickly. As can be seen
in Fig. 7 standard times were reached after four to six trials.
Figure 7. Learning Curve of the Hand Gesture System
This project described the design, implementation
and testing of a telerobotic gesture-based user interface system using visual
recognition. Two aspects of the problem have been examined, the technical aspects
of visual recognition of hand gestures in a laboratory environment, and the
usability of an interface implemented in a remote client location. Experimental
results showed that the system satisfies the requirements for a robust and user
friendly input device. The Fuzzy C-Means algorithm provided enough speed and
sufficient reliability to perform the desired tasks. The case study demonstrated
the importance of latency for telerobotic systems. Although gestures were recognized
quickly and sent in packet forms, successful execution of the commands could
not be verified until the image of the robot environment was received at the
user interface. This resulted in an overlapping effect - sending of new
gestures before complete information of the present robot position was received.
Future research will be directed to the solve this problem.
This project was supported by the Ministry
of Defense MAFAT Grant No. 2647 and partially supported by the Paul Ivanier
Center for Robotics Research & Production Management, Ben-Gurion University
of the Negev.
[1]
Katkere A., Hunter E., Kuramura D., Schlenzig J., Moezzi S. and Jain R. "ROBOGEST:
Telepresence Using Hand Gestures". Technical report VCL-94-104,
Visual Computing Laboratory, University of California, San Diego. 1994.
[2]
Kortenkamp, D., Huber, E., and Bonasso, R. P. "Recognizing and Interpreting
Gestures on a Mobile Robot". In AAAI96, 1996.
[3] Pavlovic V, Sharma R., and Huang T. "Visual Interpretation
of Hand Gestures for Human Computer Interaction: A Review". IEEE PAMI,
Vol. 19. pp. 677-695. 1997.
[4] Triesch J. and Malsburg C. V. D. "A Gesture Interface
for Human-Robot Interaction". Proc. of 3th IEEE Intl. Conf. on Automatic
Face and Gesture Recognition, pp. 546-551. 1998.
[5] Bezdek J.C. "Cluster Validity with
Fuzzy Sets". Cybernetics. Vol. 3, No. 3, 58-73. 1973.
2. System Architecture
3. Gesture Classification
3.1 Hand Gesture Language

3.2 Fuzzy C-Means Clustering
,
the algorithm minimizes the weighted within group sum of squared error objective
function,
:
(1)
is the
-th
p-dimensional data vector,
is
the prototype of the center of cluster
,
is
the degree of membership of
in
the
-th
cluster,
is a weighting exponent on each fuzzy membership,
is
a distance measure between data pattern
and cluster center
,
is the number of data patterns and
is the number of clusters. The objective function
is
minimized via an iterative process in which the degrees of membership
and
the cluster centers
are updated:
(2)
(3)
,
(4).
is
a prototype feature vector for cluster i, xk is
the feature vector of the kth exemplar gesture in the training set, uik is the "degree of belonging" (membership value) of the kth feature
vector to cluster i, c is the number of gestures in the gesture
set as well as the number of clusters, n is the number of images in the
training set.
3.3 Preprocessing and Feature Extraction


3.4 Training Stage
3.5 Classification
is determined and used in (2) to calculate the membership values
{u ik’ " i=1,…,c}. The gesture is classified
by finding: ui’k’ = Max {u ik’ ; " i=1,…,c}. A further test is made before recognition is established.
This test depends on a recognition threshold , t . If
ui’k’ >= t is true, then the gesture is
recognized as belonging to classification i’, otherwise it is said to be unclassified
as all the membership values are too low. A recognition threshold of t = 0.75 was empirically determined to provide the best performance. Fig. 5 shows
the “development interface” used for this test. Note the visualization of the
membership values as a bar chart, whose peak value in this case corresponds
to a successfully recognized "stop" gesture.

4. Testing and Task Validation
4.1 Testing the Recognition System
4.2 Case Study
5. Conclusions
Acknowledgements
References