# Multitouch Gesture Generation and Recognition Techniques - Essay Example

**Abstract: –**A huge number of users are using smart phones to communicate with each other. A smart phone user is exposed to various threats when they use their phone for communication. These threats can disorganization the operation of the smart phone, and transmit or modify user data rather than original [1]. So applications must guarantee privacy and integrity of the information. Single touch mobile security is unable to give efficient performance for confidential data. Hence we are moving towards multitouch mobile security for high security. In computing, multi-touch is authentication technology that enables a surface to recognize the presence of more than one touch points of contact with the touch screen [2]. By using multiple touch points to authenticate user for access confidential data in mobile phones. we are presenting our study about biometric gestures to authenticate user through multitouch finger points for more security [1].

**Keywords:** Multitouch, biometric gesture, authentication, security, smart phone Finger-tracking, Android Operating system.

**Introduction**

Today’s IT admins face the troublesome task of managing the unnumberable amounts of mobile devices that connect with enterprise networks a day for communication through network. has become increasingly important now days as the numbers of the devices in operation and the uses to which they are put have expanded in world wide. The problem is compounded within the enterprise as the ongoing trend toward users or organizations is resulting in more and many more employee-owned devices connecting to the corporate internet. Authentication is a nothing but process in which the credentials provided are compared to those on file in a database of valid users’ information on a . If the credentials match, the process is completed and the user is granted authorization for access to the system. The permissions and folders came back outline each the surroundings the user sees and also the method he will move with it, as well as the amount of access and different rights comparable to the number of allotted cupboard space and different services [1].

The generally a computer authentication process is to use alphanumerical usernames or text based and passwords. This method has been shown to have someA disadvantages. For example, users tend to pick passwords that can be easily guessed and recognized by other hard to remember. To device this problem, some researchers haveA developed authentication techniques that use multitouch biometric gesture as passwords for authentication.

Multi-touch, in a computing environment, is an interface technology that enables input gestures on multiple points on the surface of a device. Although most generally used with touch screen devices on handheld devices, such as and , and other multi-touch has been used for other surfaces as well, including touch pads and , tables and walls [2].

In other words, multi-touch refers to the capability of a touch screen (or a touchpad) to recognize two or more points of contact on the surface simultaneously. The constant following of the multiple points permits the portable interface to acknowledge gestures, that modify advanced practicality similar to pinch-to-zoom, pinch. wherever gesture recognition is much of deciphering human gestures via mathematical algorithms. Gestures will originate from any bodily motion however normally originate from the face or hand and alternative human biometric gestures but the identification and recognition of posture, and human behaviours is additionally the topic of gesture recognition techniques.

We used Equal Error Rate (EER) to measure accuracy. This is the rate at which False Acceptance Rate (FAR) and False Rejection Rate (FRR). To find out whether using multiple gestures would improve the system’s performance, we combined scores of 2 different gestures from the same user in the same order and evaluated the EER of the combined gestures as:

**FAR**=

**FRR=**

**Developing a Gesture Authentication Technique**

Biometric systems are an effective way to authenticate valid users generally based on the “something they are” property [2] in mobile authentication. The goal of biometric identification is that the automatic verification of identity of a living person by proving over some distinctive gestures that solely he possesses in authentication method.

Figure1: Multitouch behavior

The biometric authentication system has two phases: **enrollment** phase and **authentication phase**. If new user must first record his secret hand signs at the first enrollment phase to the system. The process is performing the hand signs at the user’s discreet choice with sufficient space for hand movement during registration phase.

**Gesture Taxonomy [1]**

1. Parallel: All fingertips are moving in the same direction

during the gesture. For example, a bush swipe, during which all 5 fingers move from left to right the screen.

2. Closed: If all fingertips are moving inward toward the center of the hand. For example, a pinch gesture.

3. Opened: All fingertips are moving outward from the center of the hand. For example, a reverse pinch gesture.

4. Circular: All fingertips are rotating around the center ofA the hand. For example, a clockwise or counterclockwiseA rotation [1].

Figure1: Single touch

**Matching Touch Sequences to Specific Fingers:**

Hidden Markov Models [3]

Hidden Markov Models (HMMs) are statistical models and simplest versions of dynamic Bayesian Networks, where the system being modelled is a Markov process with an unobserved state. It is a collection of finite states connected by transitions, much like Bayesian Networks. Each state has two probabilities: a transition probability, and an output probability distribution. Parameters of the model are determined by training data [4][5].

Figure2: Hidden Markov Models

hidden states, as well as N dimensional observable symbols.

Figure3: Multitouch Movement

The conventional HMM is expressed as the following [4]. HMM is the mathematical tool to model signals, objects A?a‚¬A¦ that have the temporal structure and follow the Markov process. HMM can be described compactly as *AZA»* = (*A, B, A?a‚¬*) (4b) where,

Figure 4: Conventional Hidden Markov Model

*A* = {*a**ij*}: the state transition matrix

aij=P[qt+1=sj|qt=si],

1A?aˆ°A¤iA?aˆ°A¤Naij=P[qt+1=sj|qt=si],

1A?aˆ°A¤iA?aˆ°A¤N

*B* = {*b**j* (*k*)}: the observation symbol probability distribution

bj(k)=P[Ot=vk|qt=sj],

1A?aˆ°A¤jA?aˆ°A¤N,

1A?aˆ°A¤kA?aˆ°A¤Mbj(k)=P[Ot=vk|qt=sj],

1A?aˆ°A¤jA?aˆ°A¤N,1A?aˆ°A¤kA?aˆ°A¤M

*A?a‚¬* = {*A?a‚¬**i*}: the initial state distribution

A?a‚¬i=P[q1=si]A?a‚¬i=P[q1=si]

Set of states: *S* = {*s*1*s*2, A?a‚¬A¦, *s**N*}

State at time t: *q**t*

Set of symbols: *V* = {*v*1, *v*2, A?a‚¬A¦, *v**M*}

Given the observation sequence OT1=O1O2…OTO1T=O1O2…OT and a model *AZA»* = (*A,B,A?a‚¬*), how do we efficiently compute *P*(*O | AZA»*), i.e., the probability of the observation sequence given the model.

Now let us consider following two states:

Training: based on the input data sequences {*O*}, we calculate and adjust AZA»=AZA»A?aˆz AZA»=AZA»A?aˆz to maximize likelihood *P*(*O | AZA»*)

Recognizing: based on AZA»A?aˆz =(AA?aˆz ,BA?aˆz ,A?a‚¬A?aˆz )AZA»A?aˆz=(AA?aˆz,BA?aˆz,A?a‚¬A?aˆz) for each class, we can then assign the class in which the likelihood *P*(*O | AZA»*) is maximized.

The observation symbol probability distribution *P*[*O**t* *= v**k* *| q**t* *= s**j*] can be discrete symbols or continuous variables. If the observations are different symbols.

B(i,k)=P(Ot=k|qt=si)

B(i,k)=P(Ot=k|qt=si)

If the observations are vectors in *R**L*, it is common to represent *P*[*O**t* | *q**t*] as a Gaussian:

P[Ot=y|qt=si]=N(y;AZA?i,AZA?i)

P[Ot=y|qt=si]=AZA?(y;AZA?i,AZA?i)

N(y;AZA?,AZA?)=1(2A?a‚¬)L/2|AZA?|1/2exp[A?E†aˆ™12(yA?E†aˆ™AZA?)TAZA?A?E†aˆ™1(yA?E†aˆ™AZA?)]

AZA?(y;AZA?,AZA?)=1(2A?a‚¬)L/2|AZA?|1/2exp[A?E†aˆ™12(yA?E†aˆ™AZA?)TAZA?A?E†aˆ™1(yA?E†aˆ™AZA?)]

A more flexible representation is a mixture of M Gaussians:

P[Ot=y|qt=si]=A?E†aˆ?m=1MP(Mt=m|qt=si)A?-A?-N(y;AZA?m,i,AZA?m,i)

P[Ot=y|qt=si]=A?E†aˆ?m=1MP(Mt=m|qt=si)A?-A?-AZA?(y;AZA?m,i,AZA?m,i)

where *M**t* is a hidden variable that specifies which mixture component to use and *P*(*M**t**=m*|*q**t**=s**i*) =*C*(*i,m*) is the conditional prior weight of each mixture component. In our approach, we both implement continuous and discrete output variable distribution for 1st and 2nd HMM stages respectively [3][6].

Dynamic Time Warping

Dynamic Time Warping (DTW), introduced by Sakoe and Chiba in 1978, is an algorithm that compares two different sequences that may possibly vary in time. For example, if two video clips of different people walking a particular path were compared, the DTW algorithm would detect the similarities in the walking pattern, despite walking speed differences, accelerations or decelerations. [3][7]

Figure 4: Dynamic time warping

The algorithm begins with a set of template streams, describing each gesture available in the system database. This results in high computation time, and hence, limitations in recognition speed. Additionally, the storing of many templates for each gesture results in costly space usage on a resource-constrained device.

Consider a training set of N sequences fS1; S2; : : : ; SNg, where each Sg represents sample of the same gesture class. Then, each sequence Sg composed by a set of feature vectors at each time t, Sg = fsg1; : : : ; sgLgg for a certain gesture category, where Lg is the length in frames of sequence Sg. Let us assume that sequences are ordered according to their length, so that Lgt1 _ Lg _ Lg+1; 8g 2 [2; ::;N ], the median length sequence is _ S = SdN2 e. This sequence _ S is used as a reference and the rest of sequences are aligned with it using the classical Dynamic Time Warping with Euclidean distance [4], in order to avoid the temporal deformations of various samples from an equivalent gesture class. Therefore, once the alignment method, all sequences have lengthLdN2 e.

We define the set of warped sequences as ~ S = f ~ S1; ~ S2; : : : ; ~ SNg. Consider a training set of N sequences fS1; S2; : : : ; SNg, where each Sg represents a sample of the same gesture class. Then, each sequence Sg composed by a set of feature vectors at each time t, Sg = fsg1; : : : ; sgLgg for a certain gesture category, where Lg is the length in frames of sequence Sg. Let us consider that sequences are ordered according to their length, so that Lgt1 _ Lg _ Lg+1; 8g 2 [2; ::;N1], the median length sequence is _ S =SdN2 e[4].

This sequence _ S is used as a reference, and the remaining of sequences are assigned with it using the classical Dynamic Time Warping with Euclidean distance [3], in order to remove the temporal deformations of different samples from the same gesture category. Hence, after the alignment process, all sequences have lengthLdN2 e. We define the set of warped sequences as ~ S = f ~ S1; ~ S2; : : : ; ~ SNg [3].

**Input:** A gesture C={c1,..cn} with corresponding GMM model AZA»={AZA»1,..AZA»m}, its similarly threshold value A‚Aµ, and the testing seprate Q={q1,..qn}, Cost Matrix M is defined,where N(x), x =(i,t) is the set of three upper-left location of x in M.

**Output:**Working path of the dected gesture, if any.

//Initialization

for i=1:m do

for j=1:A?E†A? do

M(i,j)=v

end

end

for j=1:v do

M(0,j)=0

end

for t=0:v do

for i=1:m do

x=(i,j)

M(x)=D(qi,AZA»i)+minA?AµA?A?A?A’A?Aµ N(A?AµA?)M(A?AµA?A?A?A’)

End

end

if m(m,t)

W={argminA?AµA?A?A?A’ A?Aµ N(A?AµA?)M(A?AµA?A?A?A’)}

Return

End

end [4]

Artificial Neural Networks

Artificial Neural Networks (ANNs) are networks of weighted, directed graphs where the nodes are artificial neurons, and the directed edges are connections between them. The most common ANN structure is the feed forward Multi-Layer Perceptron. Feed forward means that the signals only travels one way through the net [4][8].

For input pattern p, the i-th input layer node holds xp,i.

Net input to j-th node in hidden layer:

Now Output of j-th node in hidden layer:

Then Net input to k-th node in output layer:

Finally Output of k-th node in output layer:

Network error for p:

Neurons are arranged in layer wise, with the outputs of each neuron in the same layer being connected to the inputs of the neurons in that layer . Finally, the output layer neurons are assigned a value. Each output layer neuron show the particular class of gesture, and the record is assigned to however class’s neuron has the highest value During training, the gesture class for each neuron in the output layer is known, and the nodes can be assigned the “correct” value.

**Critical Analysis**

A critical analysis based on the results achieved by is shown in this section. ANNs, HMMs, and DTW algorithms were implemented on a mobile phone, and measured in performance according to recognition speed, accuracy and time needed to train [3]. Since Bayesian Networks are a super class of HMMs which have been tweaked towards gesture classification, they are not considered. Thus according to recognition, accuracy and training time we can say that DTW gives better performance as compare to HMM and ANN. These results are summarized below:

Table 1: Comparison between different algorithms [3]

**No.**

**Algorithm**

**Recognition Speed**

**Accuracy**

**Training Time**

1

HMMs

10.5ms

95.25%

Long

2

ANNs

23ms

90%

Medium

3

DTW

8ms

95.25%

No Training

**Finger Tracking:**

Firstly we need adjust finger tracking parameters, that’s why we need to activate the calibration in the tab in on-screen display [5][9].

**a.**** Projection Signatures:**

Projection signatures are performed directly on the resulting threshold binary image of the hand [5]. The core process of this algorithm is consists of adding the binary pixels row by row along a diagonal (the vertical in this case). Previous knowledge of the hand angle is therefore required. A low-pass filter is applied on the signature (row sums) in order to reduce low frequency variations that create many local maxima and cause the problem of multiple positives (more than one detection per fingertip). The five maxima thereby obtained correspond to the position of the five fingers.

**b. Geometric Properties**:

The second algorithm is based on the geometric properties and, as shown on line 3 of figure 5, uses a contour image of the hand on which a reference point is set. This point can be determined either by finding the centre of mass of the contour (barycenter or centroid) or by fixing a point on the wrist [6].

Figure 5: Hand Movement

Euclidean distances from that point to every contour point are then computed, with the five resulting maxima assumed to correspond to the finger ends [5]. The minima can be used to determine the intersections between fingers (finger valleys). The geometric algorithm also required filtering in order to reduce the problem of multiple positives.

** c****. Circular Hough Transform:**

The circular Hough transform is applied on the contour image of the hand but could as well be performed on an edge image with complex background if no elements of the image exhibit the circular shape of the fingertip radius. This can be done efficiently for finger ends by eliminating points that are found outside the contour image. The inconvenient is that the set of discard points contains a mix of finger valleys and false positive that cannot be sorted easily [5].

**d****. Color Markers:**

While the three previous algorithms rely only on the hand characteristics to find and track the fingers, the marker algorithm tracks color markers attached to the main joints of the fingers. Each color is tracked individually using colour segmentation and filtering [5].

This permits the identification of the different hand segments. The marker colors should therefore be easy to track and should not affect the threshold, edge or contour image of the hand. Respecting these constraints makes it possible to apply all algorithms to the same video images and therefore to compare each algorithm degree of accuracy and precision with respect to the markers [5].

**Comparisons:**

**Properties**

**Projection Signature**

**Geometric Properties**

**Circular Hough Transform**

**Color Makers**

**Locates fingers**

Good

Good

Good

Good

**Locates fingertips**

Poor

Normal

Normal

Good

**Locates finger ends and valleys**

Poor

Good

Good

Good

**Work with complex background**

Poor

Good

Normal

Good

**Precision**

Good

Good

Good

Good

**Accuracy**

Poor

Good

Good

Good

Table 2: Comparison between different techniques [5]

All the presented algorithms have succeeded, in various degrees, in detecting each finger. The projection signatures algorithm can only roughly identify a finger, but the circular Hough transform and geometric properties algorithms can find both finger intersections and finger end points, it is important to note that in the case where finger are folded, the end points don’t’ correspond to the fingertips [5].

**Conclusion:**

We have plot three prominent strategies that comprehensively characterize the signal acknowledgment that should be possible on advanced mobile phones Artificial Neural Networks, Dynamic Time Warping and Hidden Markov Models were optimized, and tested on resource constrained devices (in this instance, cellular phones), and compared against each other in terms of accuracy, and computational performance. ANNs proved to have the slowest computation performance due to the large size of the neural network. HMMs performed better, but the DTW algorithm proved to be the fastest, with comparable recognition accuracy. DTWs also did not require training, as is the case with HMMs and ANNs.

**References**

[1] Kalyani Devidas: Deshmane Android Software based Multi-touch Gestures Recognition for Secure Biometric Modality

[2] Memon, K. Isbister, N. Sae-Bae, N. and K. Ahmed, “Multitouch gesture based authentication,” IEEE Trans. Inf. Forensics Security, vol. 9, no. 4, pp. 568-582, Apr. 2014

[3] Methods for Multi-touch Gesture Recognition:Daniel Wood

[4]

[5] Finger Tracking Methods Using EyesWeb Anne-Marie Burns1 and Barbara Mazzarino2

[6]

[7]DWT: Probability-based Dynamic Time Warping and Bag-of-Visual -and-Depth-Words for Human Gesture Recognition

[8]

[9]http://whatis.techtarget.com/definition/gesture-recognition

Prof. Ramdas

Pandurang Bagawade,

Miss Pournima Akash Chavan, BE Computer

Pursuing degree in PES’s College of Engineering Phaltan.

Miss Kajal Kantilal Jadhav,

BE Computer

Pursuing degree in PES’s College of Engineering Phaltan