admin 管理员组

文章数量: 887021


2024年2月18日发(作者:指针到底是什么)

Training Support Vector Machines: an Application to Face Detection

Edgar Osuna+* Robert Freund* Federico Girosit

+Center for Biological and Computational Learning and *Operations Research Center

Massachusetts Institute of Technology

Cambridge,

MA, 02139,

U.S.A.

Abstract

chines (SVMs)

We investigate the application

nique developed by

in computer vision. SVM is a learning tech-

of

Support Vector Ma-

Labs.) that can be seen as a new method

V.

Vapnik and his team

for

trainingpolyno-

(AT&T

Bell

mid, neural network,

or

Radial Basis Functions classifiers.

The decision surjiaces are found by solving

strained quadratic programming problem. This optimiza-

a

linearly con-

tion problem is challenging because the quadratic

completely dense and the memory requirements grow with

form

is

the square

of

the number

of

data points.

global optimality, and can be used to train SVM’s over very

We present a decomposition algorithm that guarantees

large data sets. The main idea behind the decomposition

is the iterative solution

of

improved iterative values, and also establish the stopping

optimality conditions which are used both to generate

of

sub-problems and the evaluation

criteria

for

the algorithm.

of SVM,

We present experimental results

feasibility

of

our

of

implementation

and demonstrate the

our

approach

a

data points.

face detection problem that involves a data set of

50,000

on

1

Introduction

age classification have received an increasing amount of

In recent years problems such

as

object detection or im-

attention in the computer vision community.

these problems involve “concepts” (like “face”, or “people”)

In

many cases

that cannot be expressed in terms

set of features, and the only feasible approach is

of a

small and meaningful

to

learn the

solution from a set of examples. The complexity of these

problems is often such that an extremely large set of ex-

amples

accuracy. Moreover, since it

is needed in order to learn the task with the desired

vant features of the problem, the data points usually belong

is

not known what are the rele-

to some high-dimensional space (for example an image may

be represented by its grey level values, eventually filtered

with

correspondence with

a

bank

of

filters, or by

a

dense vector field that puts it in

fore, there is a need for general purpose pattern recognition

a

certain prototypical image). There-

techniques that can handle large data sets

(1

Os

-

1

O6

data

points) in high dimensional spaces (10’

-

In this paper we concentrate on the Support Vector Ma-

lo3).

chine (SVM), a pattern classification algorithm recently de-

veloped by V. Vapnik and his team at AT&T Bell Labs.

1063-6919/97 $10.00

0

1997

IEEE

polynomial, neural network, or Radial Basis Functions clas-

[l, 3,

4,

131. SVM can be seen

as

a

new way to train

sifiers. While most of the techniques used to train the above

mentioned classifiers are based on the idea of minimizing

the training error, which is usually called

SVMs operate on another induction principle, called

empirical risk,

tural risk minimization,

struc-

which minimizes an upper bound on

the generalization error. From the implementation point of

view, training

strained Quadratic

a

SVM is equivalent to solving

Programming

of variables equal to the number

(QP)

problem in

a

linearly con-

a

number

lem is challenging when the size of the data set becomes

of data points. This prob-

larger than

a large scale

a

be solved by

lem

a

QP

few thousands. In this paper we show that

decomposition algorithm: the original prob-

problem

of

the type posed by SVM can

is proved to converge to the global optimum. In order to

is

replaced by

a

sequence of smaller problems, that

show the applicability of our approach we used SVM as the

core classification algorithm in

problem that we have to solve involves training a classifier

a

face detection system. The

to

discriminate between face and non-face patterns, using

data set of

in the rest

50,000

of

this section we briefly introduce the SVM

points. The plan of the paper is

as

follows:

a

gorithm and its geometrical interpretation. In section

present our solution to the problem of training

2

we

al-

our decomposition algorithm. In section

a

SVM and

application to the face detection problem, and in section

3

we present our

we summarize our results.

4

1.1

In

this section we briefly sketch the SVM algorithm and

Support Vector

Machines

its motivation. A more detailed description of SVM can be

found in [13] (chapter

We start from the simple case of two linearly separa-

5)

and

[4].

ble classes.

We assume that we have

a

data set

D

=

{(xi,

and we wish to determine, among the infinite number

yz)}b=,

of labeled examples, where

yi

E

{-1,

l},

linear classifers that separate the data, which one will have

of

the smallest generalization error. Intuitively,

a

good choice

is

the two classes, where the margin

the

hyperplane that leaves the maximum margin between

the distances of the hyperplane from the closest point of the

is defined

as

the sum

of

two classes (see figure 1).

for the hyperplane that maximizes the margin and that min-

If the two classes are non-separable we can still look

imizes

sification errors. The trade off between margin and mis-

a

quantity proportional to the number of misclas-

classification error

that has to be chosen beforehand.

is controlled by a positive constant

In

this case it can be

C

130

Figure

small margin. (b)

1.

(a)

A

Separating Hyperplane with

with larger margin.

A

Separating Hyperplane

capability is expected from (b).

A

better generalization

shown that the solution to this problem is

a

linear classifier

f(x)

the solution

=

sign(Ff=,

o

the following QP problem:

AiyixTxi

+

b)

whose coefficents

A;

are

Minimize

W(A)

=

-AT1

+

iATDA

subject to

A

ATy

(1)

A-C1

=O

-A

<_Q

-

where

out that only a small number of coekizents

(A)i

=

Xi,

(1)i

=

1 and

Di.

-

yiyjx'xj. It turns

from zero, and since every coefficient corresponds to a par-

Xi

are different

ticular data point, this means that

by the data points associated to the non-zero coefficients.

the

solution is determined

These data points, that are called

only ones which are relevant for the solution of the prob-

support vectors, are the

lem:

all

the other data points could be deleted from the data

set and the same solution would be obtained. Intuitively,

the support vectors are the data points that lie at the border

between the two classes. Their number is usually small, and

Vapnik showed that

it

is proportional to the generalization

error of the classifier.

ally be solved by

Since

it

is unlikely that any real life problem can actu-

extended

This is easily done by projecting the original set of variables

in

order to allow for non-linear decision surfaces.

a

linear classifier, the technique has to be

x

(~I(x),

in

a

higher dimensional feature space:

x

E

Rd

+

classification

. . .

,

z(x)

qp5n(x))

E

roblem in the feature space. The solution

E

R"

and by formulating the linear

will

have the form

f(x)

=

sign(C%=, Xiyj~.~(x)z(x~)+b), and

therefore will be nonlinear in the original input variables.

One has to face at this point two problems:

the features

to

a

"rich" class of decision surfaces:

4i

(x), which should be done

1)

the choice of

2)

tin

he computation of

a way that leads

the scalar product zT

ally prohibitiveif the number of features

(x)z(xi), which can be computation-

example

n

is very large (for

to span the set of polynomials

in

the case

in

which one wants the feature space

features is exponential in d).

in

A

d

possible solution to this

variables the number of

n

problems consists in letting

following choice:

n

go

to infinity and make the

where

ai

and

$i

are the eigenvalues and eigenfunctions of an

integral operator whose kernel

symmetric function. With this choice the scalar product in

K(x,

y) is apositivedefinite

the feature space becomes particularly simple because:

where the last equality comes from the Mercer-Hilbert-

Schmidt theorem for positive definite functions (see

pp. 242-246). The QP problem that has to be solved now

[9],

is exactly

the matrix

the same as

in

eq.

(I),

with the exception that

a result of this choice, the SVM classifier has the form:

D

has now elements

Dij

=

yiyjK(xi, xj).

As

f(x)

some c=

hoices

sign(

i=,

XiyiK(x, xi)

b).

In table

(1)

we list

tice how they lead to well known classifiers,

oT

the kernel function proposed by Vapnik: no-

+

surfaces are known to have good approximation

whose decision

properties.

IKernel Function

ITvDe of Classifier

K(x,xi)

=

exp(-llx

-

xill')

I

[

K(x,

Xi)

=

(X'I'Xi

K(x,

+

1y

1

Gaussian

RBF

Polynomial of degree

d

xi)

=

tanh(xTxi

-

0)

I

Multi Layer Perceptron

I

Table

the type of decision surface they define

1.

Some possible kernel functions and

2

sets (above

As

Training

mentioned before, training a SVM using large data

a

Support Vector

Machine

approach without some kind of problem decomposition.

x

5,000

samples) is a very difficult problem to

To

give an idea of some memory requirements, an application

like the one described

samples, and this amounts to

in section

a

3

involves 50,000 training

quadratic form whose matrix

D

floating point representation,

has 2.5

.

IO9

entries that would need, using an 8-bytes

20 Gigabytes of memory.

explicit advantage of the geometrical interpretation intro-

In order to solve the training problem efficiently, we take

duced

number of support vectors will be very small, and therefore

in

Section 1.1,

in

particular, the expectation that the

that many of the components of

A

will be zero.

In order to decompose the original problem, one can

think of solving iteratively the system given by

Xi

associated

(I),

but

keeping fixed at zero level those components

with data points that are not support vectors, and therefore

only optimizing over

To

convert the previous description into an algorithm we

a

reduced set of variables.

need to specify:

1.

Optimality Conditions:

decide computationally,

optimally at a particular iteration

if

These conditions allow

the problem has been solved

us

to

lem. Section

of

the original prob-

tions for the QP given by

2.1 states and proves optimality condi-

(I).

131

2.

Strategy for Improvement:

not optimal, this strategy defines a way to improve the

If

a particular solution is

cost function and is frequently associated with variables

that violate optimality conditions. This strategy will be

stated in section 2.2.

After presenting optimality conditions and a strategy for

improving the cost function, section 2.3 introduces a decom-

position algorithm that can be used to solve large database

training problems, and section 2.4 reports some computa-

tional results obtained with its implementation.

2.1

Optimality Conditions

The QP problem we have to solve is the following:

Minimize

W(A)

=

A

-AT1

+

iATDA

subject to

ATy

A-Cl

=0

50

(PI

-A

50

(=I)

where nd

the associated Kuhn-Tucker multipliers.

p,

YT

=

(q,.

IIT

=

(2)

.

.,ut)

a(TI,.

. .

,re)

are

function used is positive definite

Since

D

is a positive semi-definite matrix

(

the kernel

are linear, the Kuhn-Tucker,

),

and the constraints in (2)

and sufficient for optimality.

(KT)

conditions are necessary

follows:

The KT conditions are as

rI

Y

>O

20

(3)

AT

Y

=0

A-Cl

-A

50

60

In order to derive further algebraic expressions from the

optimality conditions (3), we assume the existence of some

A;

such that

0

<

A;

<

C,

and consider the three possible

values that each component of

A

can have:

1.

Case:

From the first three equations of the KT conditions we

0

<

Xi

<

C

have:

(DA),

-

1

+

pyi

=

0

(4)

Using the results in

that when

[4]

and

[13]

equality holds:

X

is strictly between

0

oand

ne can easily show

C

the following

e

yi(CXjyjA-(xi,xj)+b)

=

1

(5)

j=1

Noticing that

e

e

(DA)i

=

Xj

yj

~i

K(x;,

xj

)

=

Y;

yj

K(x;,

xj

)

j=l j=l

and combining this expression with

immediately obtain that

p

=

b.

(5)

and

(4)

we

2.

Case:

A;

=

By defining

C

and noticing that

e

(DA),

=

yYa ~XjyjK(~i,~j)

=

y;(g(x;)

-

b)

j=1

we conclude that

Yidxi)

5

1

(7)

(where we have used the fact that

the KT multiplier

vi

to be positive).

/-1

=

b

and required

3.

Case:

A;

By applying a similar algebraic manipulation as the

=

0

one described for case 2, we obtain

2.2

Strategy

are essential in order to devise a decomposition strategy that

The optimality

for

conditions

Improvement

derived in the previous section

takes advantage

zero, and that guarantees that at every iteration the objective

of

the expectation that most

Xi’s

will be

function is improved.

in

two sets

In order to accomplish

B

and

N

in such a way that the optimality condi-

this goal we partition the index set

tions hold in the subproblem

the set

B,

which is called the

defined only for the variables in

working set. Then we decom-

pose

A

in two vectors

AB

and

AN

and set

AN

=

0.

Using

this decomposition the following statements are clearly true:

0

We can replace

without changing the cost function or the feasibility of

A;

=

0,

i E

B,

with

Xj

=

0,

j

E

N,

both the subproblem and the original problem.

0

After such a replacement, the new subproblem is opti-

mal if and only if y,g(xj)

equation

2

1.

This follows from

was optimal

(8)

and the assumption that the subproblem

before

the replacement was done.

The previous statements lead to the following proposi-

tion:

Proposition

on

B,

2.1

Given an optimal solution

the operation

of

replacing

A; =

of

0,

a

subproblem

defined

i

E

B,

with

Xj

subproblem that when optimized,

=

0,

j

E

N,

satisfying

yjg(xj)

of the objective function.

A

detailedproof of this

yields a strict improvement

<

1

generates a new

proposition

can be found in

[8]

132

Suppose we

can

define a fixed-size working set

B,

such

that

IBI

5

e,

and it is big enough to contain all support

vectors

(Ai

>

0),

but small enough such that the computer

can handle it and optimize it using some solver. Then the

decomposition algorithm can be stated as follows:

2.3

The Decomposition Algorithm

1.

Arbitrarily choose

1

BI

points from the data set.

2.

Solve the subproblem defined by the variables

in

B.

3. While there exists some

j

E

IV,

such that

g(xj)yj

<

1,

replace

Xi

=

0,

i

E

B,

with

Xj

=

0

and solve the new

subproblem.

Notice that, according to (2.

I),

this algorithm will strictly

improve the objective function at each iteration and there-

fore will not cycle. Since the objective function is bounded

(W(A)

is convex quadratic and the feasible region is

bounded), the algorithm must converge to the global opti-

mal solution

in

a finite number of iterations. Figure 2 gives

a

geometric interpretation of the way the decomposition

al-

gorithm allows the redefinition of the separating surface by

adding points that violate the optimality conditions.

Figure

3.

(a) Number

of

Support Vectors

vs.

number of samples; (b) Training Time on

a

SPARCstation-20 vs. Number

of

Support Vec-

tors.

that the last

1,000

data points are points which were misclas-

sified by the previous version of the classifier, which was

already quite accurate, and therefore points likely to be on

the border between the two classes and therefore very hard

to classify.

The memory requirements of this technique are quadratic

in

the size of the working set

B.

For the

50,000

points data

set we used a working set of 1,200 variables, that ended

up using only 25Mb of RAM. However, a working set of

2,800

variables will use approximately 128Mb

of

RAM.

Therefore, the current technique can deal with problems

with less than

2,800

support vectors (actually we empirically

found that the working set size should be about

20%

larger

than the number of support vectors).

In

order to overcome

this limitation we are implementing an extension of the

decomposition algorithm that let us deal with very large

numbers of support vectors (say

100,000).

3

SVM

Application: Face Detection

Figure

2.

(a) A sub-optimal solution where the

non-filled points have

X

=

0

but are violating

optimality conditions by being inside the

*l

area. (b) The decision surface is redefined.

Since no points with

X

=

0

are inside the

rtl

area, the solution is optimal. Notice that

the size

of

the margin has decreased, and the

shape of the decision surface has changed.

This section introduces

a

Support Vector Machine ap-

plication for detecting vertically oriented and unoccluded

frontal views of human faces

in

grey level images. It han-

dles faces over a wide range of scales and works under

different lighting conditions, even with moderately strong

shadows.

The face detection problem can be defined as follows:

given as input an arbitrary image, which could be a digitized

video signal or a scanned photograph, determine whether or

not there are any human faces

in

the image, and

if

there are,

return an encoding of their location.

Face detection

as

a computer vision task has many ap-

plications. It has direct relevance to the face recognition

problem, because the first important step of

a

fully auto-

matic human face recognizer is usually locating faces

in

an

unknown image.

Face

detection also has potential appli-

cation

in

human-computer interfaces, surveillance systems,

census systems, etc.

From the standpoint of this paper, face detection is inter-

esting because

it

is an example of

a

natural and challenging

problem for demonstrating and testing the potentials

of

Sup-

port Vector Machines. There are many other object classes

and phenomena

in

the real world that share similar charac-

teristics, for example, tumor anomalies

in

MRI scans, struc-

tural defects

in

manufactured parts, etc. A successful and

2.4

Implementation and Results

We have implemented the decomposition algorithm using

MINOS 5.4 as the solver of the sub-problems. For infor-

mation on MINOS 5.4 see

[7].

The computational results

that

we

present in this section have been obtained using real

data from our Face Detection System, which is described

in

Section 3.

Figures

3a

and 3b show the training time and the number

of support vectors obtained when training the system with

5,000,

10,000,

20,000,

30,000,40,000, 49,000, and 50,000

data points. The discontinuity

in

the graphs between 49,000

and

50,000

data points is due to the fact that the last

1,000

data points were collected

in

the last phase

of

bootstrapping

of the Face Detection System (see section 3.2). This means

133

general methodology for finding faces using SVM’s should

generalize well for other spatially well-defined pattern and

feature detection problems.

object detection problems, is a difficult task due

It is important to remark that face detection, like most

nificant pattern variations that are hard to parameterize an-

to

the sig-

alytically. Some common sources

facial appearance, expression, presence or absence of com-

of

pattern variations are

mon structural features, like glasses or a moustache, light

source distribution, shadows, etc.

The system works by scanning an image for face-like

patterns at many possible scales and uses a SVM as its core

classification algorithms to determine the appropriate class

(facehon-face).

3.1

Previous Systems

different techniques

The problem of face detection has been approached with

clude Neural Networks [2, 10, 121, detection of face features

in

the last few years. This techniques in-

and use

of the training data [6], labeled graphs

of

geometrical constraints

[

141,

distribution-based modeling

[I

I].

[5]

density estimation

and clustering and

Poggio

Out of all these previous works, the results of Sung and

very high detection rates and low false positive rates.

[l

13, and Rowley et

al.

[lo] reflect systems with

Sung and Poggio use clustering and distance metrics to

model the distribution of the face and non-face manifold,

and a Neural Network to classify a new pattern given the

measurements. The key

clustering and use of combined Mahalanobis and Euclidean

of the quality of their result is the

metrics to measure the distance from a new pattern and the

clusters. Other important features of their approach is the

use of non-face

nique to collect important non-face

clusters, and the use of a bootstrapping

patterns. One drawback

tech-

of this technique is that

to choose some important free parameters like the number

it

does not provide a principled way

of clusters it uses.

Similarly, Rowley

et

in the design of a retinally connected Neural Network that

al.

have used problem information

is trained to classify faces and non-faces patterns. Their

approach relies on training several NN emphasizing sub-

sets of the training data, in order to obtain different sets

of weights. Then, different schemes of arbitration between

them are used in order to reach a final answer.

Our approach to face detection with SVM uses no prior

information in order to obtain the decision surface,

this technique could be used to detect other kind of objects

so

that

in digital images.

3.2

The

SVM

Face Detection System

image for face-like patterns at many possible scales, by di-

This system detects faces by exhaustively scanning an

viding the original image into overlapping sub-images and

classifying them using a SVM to determine the appropriate

class (facehon-face). Multiple scales are handled by exam-

ining windows taken

from

scaled versions of

the

original

image. More specifically, this system works as follows:

1. A

database of face and non-face

patterns, assigned to classes

19

used to train a SVM with a 2nd-degree polynomial as

+I

and

x

-1

19

respectively, is

=

361 pixel

kernel function and an upper bound

C

=

200.

2. In order to comr>ensate for certain sources of image

variation, some ;reprocessing of the data is perform&:

0

some pixels close to the boundary of the window

Masking:

A binary pixel mask is used to remove

pattern allowing a reduction

of the input space to 283. This step is important

in

the dimensionality

in the reduction of background patterns that in-

troduce unnecessary noise

in

the training process.

e

Illumination gradient correction:

brightness plane is subtracted from the unmasked

A

best-fit

window pixel values, allowing reduction

of

light

and heavy shadows.

0

Histogram equalization:

tion is performed over the patterns in order to

A

histogram equaliza-

compensate

ness, different cameras response curves, etc.

for differences in illumination bright-

3. Once a decision surface has been obtained through

training, the run-time system is used over images that

do not contain faces, and misclassifications are stored

so

training phases. Images of landscapes, trees, buildings,

they can be used as negative examples

in

subsequent

rocks, etc., are good sources of false positives due to

the many different textured patterns they contain. This

bootstrapping step, which was successfully used by

Sung and Poggio

of a face detector that learns from examples because:

[

1

11

is very important in the context

0

Although negative examples are abundant, neg-

ative examples that are useful from a learning

point of view are very difficult to characterize

and define.

e

The two classes, face and non-face

complex since the non-face class is broader and

are not equally

richer, and therefore needs more examples in or-

der to get an accurate definition that separates it

from the face class. Figure

used for bootstrapping with some misclassifica-

4

shows an image

tions, that were later used as non-face examples.

4.

After training the

sifier in a run-time system very similar to the one used

SVM,

we incorporate it as the clas-

by Sung and Poggio

operations:

[

111

that performs the following

0

Re-scale the input image several times.

0

Cut

image.

19x19

window patterns out of the scaled

0

Preprocess the window using masking, light cor-

rection and histogram equalization.

0

Classify the pattern using the SVM.

0

If the pattern

in

the output image.

is

a face, draw a rectangle around it

3.2.1

Experimental Results

To

The set

test the run-time system, we used two sets of images.

per image. The set

A,

contained 3 13 high-quality images with one face

with

B,

contained 23 images of mixed quality,

a

total

of

155

faces.

Both

sets were tested using our

svstem and the one

sve true meaning to the nuGber of fGge positives obtained,

bv

Sung and Posxio

rlll.

In

order to

134

Figure

4.

the first version of the system. This false

Some false detections obtained with

itives were later used as non-face examples

pos-

in the training process

it is important to state that set

A

involved

4,669,960

pattern

windows, while set

ison between the

is approximately

2

systems. At run-time the

B

5,383,682.

Table

2

shows a compar-

and Poggio. One reason for that is the use of a technique

30

times faster than the system of Sung

SVM

system

introduced by

numbers of support vectors with a much smaller number

C.

Burges

[3]

that allows to replace a large

points (which are not necessarily data points), and therefore

of

to speed up the run time considerably.

In figure

images. Notice that the system is able to handle, up to a

5

we report the result of our system on some test

small degree, non-frontal views of faces. However, since

the database does not contain any example of occluded faces

the system usually misses partially covered faces, like the

ones in the bottom picture of figure

deal with some degree of rotation in the image plane, since

5.

The system can also

the data base contains a number of “virtual” faces that were

obtained

In figure

by rotating some face example of up to

10

degrees.

tained, both for face and non-face patterns. We represent

6

we report some of the support vectors we ob-

images as points in a fictitious two dimensional space and

draw an arbitrary boundary between the two classes. Notice

how we have placed the support vectors at the classification

boundary,

Notice

also

accordingly with their geometrical interpretation.

how the non-face support vectors are not just

random non-face patterns, but

are

non-face patterns that are

quite similar to faces.

Test

Set

A

Test

Set

B

74.2%

Table

tion system

2.

Performance of the SVM face detec-

Figure

system

5.

Results

from

our Face Detection

135

Acknowledgements

11

NON-FACES

I

n

n

v

I1

I

Figure

faces and squares represent non-faces. On

6.

In this picture circles represent

the border between the two classes we repre-

sented some

our system. Notice

of

the support vectors found by

support vectors are very similar to faces.

how some

of

the non-face

4

Summary

and Conclusions

In this paper we have presented

gorithm

large data sets (say

that can be used to train Support Vector Machines on

a

novel decomposition al-

of the algorithm can deal with about

50,000

data points). The current version

on

of the technique currently under development will be able

a

machine with

128

Mb of RAM, but an implementation

2,500

support vectors

deal with much larger number of support vectors (say about

to

cability of SVM by embedding SVM in a face detection

100,000) using less memory. We demonstrated the appli-

system which performs comparably to other state-of-the-art

systems. There

investigating the use

are

several reasons for which we have been

SVMs

of

SVM. Among them, the fact that

of view, being an approximate implementation of the Struc-

are

very well founded from the mathematical point

tural Risk Minimization induction principle. The only free

parameters of SVMs

parameter associated to the kernel

are

the positive constant

C

and the

of the polynomial). Since

K

(in our case the degree

tween the number of support vectors and the total number of

the expected value of the ratio be-

data points is an upper bound on the generalization

number of support vector gives us an immediate estimate of

error, the

the difficulty

dimensional input vectors, and therefore their use seem to

of the problem. SVMs handle very well high

be appropriate in computer vision problems in which it is

not clear what the features are, allowing the user to represent

the image as a (possibly large) vector of grey levels

'.

'This paper describes research done within the Center for Biological

and Computational Learning

in

the Department

of

Brain and Cognitive

Sciences and at the AI Lab. at

MIT.

This research is sponsored by

a

grant

from NSF under contract ASC-9217041 (this award includes funds from

Vladimir Vapnik, Michael Oren and Constantine Papageor-

The authors would like to thank Tomaso Poggio,

giou for useful comments and discussion.

References

B.E. Boser, I.M. Guyon, and V.N. Vapnik. A training al-

gorithm for optimal margin classifier. In

Proc. 5th ACM

Workshop

on

Computational Learning Theory,

pages

144-

152, Pittsburgh, PA, July 1992.

G.

Burel and D. Carel. Detection and localization of faces

on

digital images.

Pattern Recognition Letters,

15:963-967,

1994.

C.J.C. Burges. Simplified support vector decision rules.

In

International Conference on Machine Learning,

pages

7

1-

77. 1996.

C. Cortes and V. Vapnik. Support vector networks.

Machine

Learning,

20:1-25, 1995.

N. Kriiger, M. Potzsch, and C. v.d. Malsburg. Determination

of face position and pose with learned representation based

on labled graphs. Technical Report

96-03,

Ruhr-Universitat,

January 1996.

B. Moghaddam and A. Pentland. Probabilistic visual learn-

ing for object detection. Technical Report 326, MIT Media

Laboratory, June 1995.

B. Murtagh and M. Saunders. Large-scale linearly con-

strained optimization.

Mathematical Programming,

14:41-

72, 1978.

E. Osuna, R. Freund, and

E

Girosi. Support vector machines:

Training and applications. A.I. Memo 1602, MIT A. I. Lab.,

1997.

F. Riesz and B. Sz.-Nagy.

Functional Analysis.

Ungar, New

York, 1955.

H.

Rowley,

S.

Baluja, and

T.

Kanade. Human face detec-

tion in visual scenes. Technical Report CMU-CS-95-158R,

School of Computer Science, Camegie MelIon University,

November 1995.

K.

Sung and T. Poggio. Example-based Learning for View-

based Human Face Detection. A.I. Memo 1521, MIT A.I.

Lab., December 1994.

R. Vaillant, C. Monrocq,

and

Y. Le Cun. Original approach

for the localisation of objects in images.

IEEE Proc. Vis.

Image Signal Process.,

141(4), August 1994.

V. Vapnik.

The Nature

of

Statistical Learning Theory.

Springer, New York, 1995.

G.

Yang

and

T. Huang. Human face detection in a complex

background.

Pattern Recognition,

2753-63, 1994.

ARPA provided under the HPCC program), by

a

grant from ARPNONR

under contract N00014-92-J-1879 and by a MURI grant under contract

onal

support is provided by Daimler-Benz, Sum-

itomo Metal Industries, and Siemens AG. Edgar

Osuna

was supported by

Fundaci6n Gran Mariscal de Ayacucho.

136


本文标签: 指针 作者