以下是computer vision:algorithm and application计算机视觉算法与应用这本书中附录里的关于计算机视觉的一些测试数据集和源码站点,我整理了下,加了点中文注解。

Computer Vision:

Algorithms and Applications

Richard Szeliski


C.1 数据集



(/rbf/CVonline ), (/ ),

and Computer Vision online (/ ), 有更多最新的数据集和软件。



CUReT: Columbia-Utrecht 反射率和纹理数据库Reflectance and Texture Database,

/CAVE/software/curet/ (Dana, van Ginneken, Nayar et al. 1999).

Middlebury Color Datasets:不同摄像机拍摄的图像,注册后用于研究不同的摄像机怎么改变色域和彩色registered color images taken by different cameras to study how they transform

gamuts and colors, /color/data/ Chakrabarti, Scharstein, and Zickler



Middlebury test datasets for evaluating MRF minimization/inference algorithms评估隐马尔科夫随机场最小化和推断算法,

/MRF/results/ (Szeliski, Zabih, Scharstein et al. 2008).


Affine Covariant Features database(反射协变的特征数据集) for evaluating feature detector and

descriptor matching quality and repeatability(评估特征检测和描述匹配的质量和定位精度),


(Miko-lajczyk and Schmid 2005; Mikolajczyk, Tuytelaars, Schmid et al. 2005).

Database of matched image patches for learning (图像斑块匹配学习数据库)and feature

descriptor evaluation(特征描述评估数据库),


(Winder and Brown 2007; Hua,Brown, and Winder 2007).


Berkeley Segmentation Dataset(分割数据库) and Benchmark of 1000 images labeled by 30

humans,(30个人标记的1000副基准图像)along with an evaluation,

/Research/Projects/CS/vision/grouping/segbench/ (Martin,

Fowlkes, Tal et al. 2001).

Weizmann segmentation evaluation database of 100 grayscale images with ground truth


/~vision/Seg Evaluation DB/

(Alpert, Galun, Basri et al. 2007).


The Middlebury optic flow evaluation(光流评估) Web site,


(Baker, Scharstein, Lewis et al. 2009).

The Human-Assisted Motion Annotation database,(人类辅助运动数据库)

/celiu/motionAnnotation/ (Liu, Freeman, Adelson et al. 2008)


High Dynamic Range radiance(辐射)maps, /Research/HDR/

(De-bevec and Malik 1997).

Alpha matting evaluation Web site, / (Rhemann, Rother, Wang

et al. 2009).

第十一章:Stereo correspondence立体对应

Middlebury Stereo Datasets and Evaluation, /stereo/ (Scharstein

and Szeliski 2002).

Stereo Classification(立体分类) and Performance Evaluation(性能评估) of different

aggregation(聚类) costs for stereo matching(立体匹配),

/spe/ (Tombari, Mat-

toccia, Di Stefano et al. 2008).

Middlebury Multi-View Stereo Datasets,

/mview/data/ (Seitz,Curless, Diebel et al. 2006).

Multi-view and Oxford Colleges building reconstructions,

/~vgg/data/ .

Multi-View Stereo Datasets, /data/strechamvs/ (Strecha, Fransens,

and Van Gool 2006).

Multi-View Evaluation, /~strecha/multiview/ (Strecha, von Hansen,

Van Gool et al. 2008).


HumanEva: synchronized video(同步视频) and motion capture (动作捕捉)dataset for

evaluation of articulated human motion, /humaneva/ Sigal, Balan, and

Black 2010).


The (New) Stanford Light Field Archive, /

(Wilburn, Joshi,Vaish et al. 2005).

Virtual Viewpoint Video: multi-viewpoint video with per-frame depth maps,

/en-us/um/redmond/groups/ivm/vvv/ (Zitnick, Kang, Uytten-

daele et al. 2004).



Buffy pose classes, /~vgg/data/ buffy pose classes/ and Buffy

stickmen V2.1, /~vgg/data/stickmen/ (Ferrari,Marin-

Jimenez, and Zisserman 2009; Eichner and Ferrari 2009).

H3D database of pose/joint annotated photographs of humans,

/~lbourdev/h3d/ (Bourdev and Malik 2009).

Action Recognition Datasets, /projects/vision/action, has point-

ers to several datasets for action and activity recognition, as well as some papers.(有一些关于人活动和运动的数据库和论文) The human action database at

/cvap/actions/ 包含更多的行动序列。

C.2 软件资源

一个对于计算机视觉算法最好的资源就是开源视觉图像库(opencv)(/wiki/),他有在intel的Gary Bradski和他的同事开发,现在由Willow Garage (Bradsky and Kaehler 2008)维护和扩展。一部分可利用的函数在/documentation/cpp/中:

图像处理和变换 (滤波,形态学,金字塔);

图像几何学的变换 (旋转,改变大小);

混合图像变换 (傅里叶变换,距离变换);


分割 (分水岭, mean shift);

特征检测 (Canny, Harris, Hough, MSER, SURF);

运动分析和物体分析 (Lucas–Kanade, mean shift);


机器学习 (k nearest neighbors, 支持向量机, 决策树, boost-

ing, 随机树, expectation-maximization, 和神经网络).

Intel的Performance Primitives (IPP) library, /en-us/intel-ipp/,包含


MTALAB中的Image Processing Toolbox图像处理工具,/products/image/,包含常规的处理,空域变换(旋转,改变大小),常规正交,图像分析和统计学(变边缘,哈弗变换),图像增强(自适应直方图均衡,中值滤波),图像恢复(去模糊),线性滤波(卷积),图像变换(傅里叶,离散余弦变换)和形态学操作(连通域和距离变换)


VXL (C++ Libraries for Computer Vision Research and Implemen-tation,


LTI-Lib 2 (/palvarado/ltilib-2/homepage/ ).

图像编辑和视图包,例如Windows Live Photo Gallery, iPhoto, Picasa,GIMP, 和 IrfanView,它们对执行这些处理非常有用:常规处理任务,格式转换,观测你的结果。它们同样可以用于对图像处理算法有趣的实现参考,例如色调调整和去噪。

这里他也有一些软件包和基础框架对你建一个实时视频处理的DEMOS很有用,Vision on

Tap(/ )提供一个可以实时处理你的网络摄像头的网页服务(Chiu

and Raskar 2009)。Video-Man (VideoManager, /处理实时的基于视频的DEMOS和应用非常有用,你也可以用MATLAB中的imread直接从任何URl(例如网络摄像头)中读取视频。



matlabPyrTools—MATLAB 下的源码对于拉普拉斯变换,金字塔, QMF/小波, 和

steerable pyramids, /~lcv/ (Simoncelli and Adel-

son 1990a; Simoncelli, Freeman, Adelson et al. 1992).

BLS-GSM 图像去噪, /~javier/denoise/ (Portilla, Strela,Wain-

wright et al. 2003).

Fast bilateral filtering code(快速双边滤波), /jiawen/#code (Chen, Paris,

and Durand 2007).

C++ implementation of the fast distance transform algorithm,

/~pff/dt/ (Felzenszwalb and Huttenlocher 2004a).

GREYC’s Magic Image Converter, including image restoration software using regularization and

anisotropic diffusion, / (Tschumperl´ e and Deriche 2005).


VLFeat, 一个开放便捷的计算机视觉算法库

/ (Vedaldi and Fulkerson 2008).

SiftGPU: A GPU Implementation of Scale Invariant Feature Transform (SIFT),


/~ccwu/siftgpu/ (Wu 2010).

SURF: Speeded Up Robust Features, /~surf/

(Bay, Tuyte-laars, and Van Gool 2006).

FAST corner detection, /~er258/work/

(Rosten and Drum-mond 2005, 2006).

Linux binaries for affine region detectors and descriptors, as well as MATLAB files to

compute repeatability and matching scores,


Kanade–Lucas–Tomasi feature trackers: KLT, /~stb/klt/ (Shi and

Tomasi 1994);

GPU-KLT, /~cmzach/ (Zach,Gallup, and Frahm 2008);

Lucas–Kanade 20 Years On, /projects/project (Baker and

Matthews 2004).



(Felzenszwalb and Huttenlocher 2004b).

EDISON, 边缘检测和图像追踪,


(Meer and Georgescu 2001; Comaniciu and Meer 2002).

Normalized cuts segmentation including intervening contours,


(Shi and Malik 2000; Malik, Belongie, Leung et al. 2001).

Segmentation by weighted aggregation (SWA),利用加权集合的分割

/~vision/SWA (Alpert, Galun, Basri et al. 2007).


Non-iterative PnP algorithm,(非迭代PnP算法) (Moreno-Noguer, Lep-etit, and Fua 2007).

Tsai Camera Calibration(相机矫正) Software,

/~rgw/ (Tsai 1987).

Easy Camera Calibration Toolkit,(简易相机校准工具包)

/en-us/um/people/zhang/ Calib/ (Zhang 2000).

Camera Calibration Toolbox for MATLAB,

/bouguetj/calib doc/ ; a C version is included in OpenCV.

MATLAB functions for multiple view geometry,

/~vgg/hzbook/code/ (Hartley and Zisserman 2004).


SBA: A generic sparse bundle(稀疏束) adjustment C/C++ package based on the Levenberg–

Marquardt algorithm, /~lourakis/sba/ (Lourakis and Argyros 2009).

Simple sparse bundle adjustment (SSBA), /~cmzach/ .

Bundler, structure from motion for unordered image collections(无序图像集),

/bundler/ (Snavely, Seitz, and Szeliski 2006).


光流, /~black/ (Black and Anan-

dan 1996).

Optical flow(光流) using total variation(全变量差) and conjugate gradient descent(共轭梯度下降), /celiu/OpticalFlow/ (Liu 2009).

TV-L1 optical flow on the GPU, /~cmzach/

(Zach,Pock, and Bischof 2007a).

elastix: a toolbox for rigid(刚性) and nonrigid(非刚性) registration of images(配准图像),

/ (Klein, Staring, and Pluim 2007).

Deformable image registration(可变形的配准图像) using discrete optimization(离散最优化),


(Glocker, Komodakis, Tziritas et al. 2008).


Microsoft Research Image Compositing Editor for stitching images,(图像拼接,图像合成)

/en-us/um/redmond/groups/ivm/ice/ .


HDRShop software for combining bracketed exposures(包围式曝光) into high-dynamic range

radiance images, /graphics/HDRShop/.

Super-resolution(超分辨率) code,

/~vgg/software/SR/ (Pickup 2007;Pickup, Capel, Roberts et al. 2007,



StereoMatcher, standalone C++ stereo matching code,

/stereo/code/ (Scharstein and Szeliski 2002).

Patch-based multi-view stereo software (PMVS Version 2),

/software/pmvs/ (Furukawa and Ponce 2011).


Scanalyze: a system for aligning and merging range data,

/software/scanalyze/ (Curless and Levoy 1996).

MeshLab: software for processing, editing, and visualizing unstructured 3D triangular

meshes, /.

VRML viewers (various) are also a good way to visualize texture-mapped 3D models.

节 12.6.4: Whole body modeling and tracking(全身建模和追踪)

Bayesian 3D person tracking(贝叶斯3D人体追踪), /~black/

(Sidenbladh,Black, and Fleet 2000; Sidenbladh and Black 2003).

HumanEva: baseline code for the tracking of articulated human motion,

/humaneva/ (Sigal, Balan, and Black 2010).

节 14.1.1: Face detection(人脸检测)

Sample face detection code and evaluation tools,


节 14.1.2: Pedestrian detection(行人追踪)

A simple object detector with boosting,


(Hastie, Tibshirani, and Friedman 2001; Torralba, Murphy, and Freeman 2007).

Discriminatively(有区别) trained deformable(可变形) part models,

/~pff/latent/ (Felzenszwalb, Girshick, McAllester et al. 2010).

Upper-body detector(上身检测),

/~vgg/software/UpperBody/ (Ferrari,Marin-Jimenez, and Zisserman


2D articulated human pose estimation software,

/~calvin/articulated_human_pose_estimation_code/ (Eichner and

Ferrari 2009).

节 14.2.2: Active appearance and 3D shape models

AAMtools: An active appearance modeling toolbox,

/software/AAMtools/ (Papandreou and Maragos 2008).

节 14.3: Instance recognition

FASTANN and FASTCLUSTER for approximate k-means (AKM),

/~vgg/software/ (Philbin, Chum, Isard et al. 2007).

Feature matching using fast approximate nearest neighbors,

/~mariusm//FLANN/FLANN (Muja and Lowe 2009).

节 14.4.1: Bag of words(词袋)

Two bag of words classifiers, /fergus/iccv2005/

(Fei-Fei and Perona 2005; Sivic, Russell, Efros et al. 2005).

Bag of features and hierarchical(分层) k-means, / (Nist´ er and Stew´

enius2006; Nowak, Jurie, and Triggs 2006).

节 14.4.2: Part-based models

A simple parts and structure object detector,


(Fischler and Elschlager 1973; Felzenszwalb and Huttenlocher 2005).

节 14.5.1: Machine learning software

Support vector machines (SVM) software (

/SVM )


SVMlight / ;

LIBSVM, /~cjlin/libsvm/ (Fan, Chen,and Lin 2005);

LIBLINEAR, /~cjlin/liblinear/ (Fan,Chang, Hsieh et al. 2008).

Kernel Machines: links to SVM, Gaussian processes, boosting, and other machine

learning algorithms, /software .

Multiple kernels for image classification,


(Varma and Ray 2007; Vedaldi, Gulshan, Varma et al. 2009).

附录 A.1–A.2: Matrix decompositions(矩阵分解) and linear least squares(线性最小乘)

BLAS (Basic Linear Algebra Subprograms基本线性代数子程序),

/blas/ (Blackford,Demmel, Dongarra et al. 2002).

LAPACK (Linear Algebra(线性代数) PACKage),

/lapack/ (Anderson, Bai,Bischof et al. 1999).

GotoBLAS, /tacc-projects/.

ATLAS (Automatically Tuned Linear Algebra Software),

/ (Demmel, Dongarra, Eijkhout et al. 2005).

Intel Math Kernel Library (MKL), /en-us/intel-mkl/.

AMD CoreMath Library (ACML),

/cpu/Libraries/acml/Pages/ .

Robust PCA code(鲁棒主成分分析), /~ftorre/papers/

(De la Torre and Black 2003).

Appendix A.3: Non-linear least squares非线性最小二乘

MINPACK, /minpack/.

levmar: Levenberg–Marquardt nonlinear least squares algorithms, 非线性最小二乘

/~lourakis/levmar/ (Madsen, Nielsen, and Tingleff 2004).

附录 A.4–A.5: Direct(直接) and iterative(迭代) sparse matrix(稀疏矩阵) solvers

SuiteSparse (various reordering algorithms, 各种各样的重排算法CHOLMOD) and SuiteSparse

QR, /research/sparse/SuiteSparse/ (Davis 2006, 2008).

PARDISO (iterative and sparse direct solution), /.

TAUCS (sparse direct, iterative, out of core, preconditioners),

/~stoledo/taucs/ .

HSL Mathematical Software Library, / .

Templates for the solution of linear systems(线性系统解决问题的模板),

/linalg/html templates/ (Barrett, Berry, Chan et al. 1994).

Download the PDF for instructions(说明) on how to get the software.

ITSOL,MIQR, and other sparse solvers,

/~saad/software/ (Saad 2003).

ILUPACK, /~bolle/ilupack/ .

附录 B: Bayesian modeling and inference(贝叶斯建模和推断)

Middlebury source code for MRF minimization(隐马尔科夫随机场最小化),

/MRF/code/ (Szeliski, Zabih, Scharstein et al. 2008).

C++ code for efficient belief propagation for early vision,

/~pff/bp/ (Felzenszwalb and Huttenlocher 2006).

FastPD MRF optimization(最优化) code,

/~komod/FastPD (Komodakisand Tziritas 2007a; Komodakis, Tziritas,

and Paragios 2008)

算法 C.1 C algorithm for Gaussian random noise generation, using the Box–Muller transform.

C描述的利用Box–Muller 变换产生高斯随机噪声

double urand()


return ((double) rand()) / ((double) RAND MAX);


void grand(double& g1, double& g2)


#ifndef M_PI

#define M_PI 3.979323846

#endif // M_PI

double n1 = urand();

double n2 = urand();

double x1 = n1 + (n1 == 0); /* guard against log(0) */

double sqlogn1 = sqrt(-2.0 * log (x1));

double angl = (2.0 * M PI) * n2;

g1 = sqlogn1 * cos(angl);

g2 = sqlogn1 * sin(angl);


高斯噪声的产生。许多基本的软件包产生一些不同的随机的噪声(例如 运行在unix上的rand()),但是并不是所有的都有高斯随机噪声发生器。计算一个离散随机常量,你可以用Box–Muller transform (Box and Muller 1958),他的c代码在算法C.1中给出了,注意这个运行结果是返回一对随机变量。相关的产生高斯随机变量的方由Thomas, Luk, Leong et al.



对于每一个(非消极)标记值,consider the bits as being split among the three color channel,例如对于一个比特值为9的值,



图 8.16 显示了这样一个伪彩色绘制的例子.


GPU的出现,可以处理像素着色和计算着色,导致了实时应用的快速计算机视觉算法的发展,例如,分割,追踪,立体和运动估计((Pock, Unger, Cremerset al. 2008; Vineet and Narayanan

2008; Zach, Gallup, and Frahm 2008)。一个好的资源来学习这些算法就是CVPR 2008 上关于Visual Computer Visionon GPUs的workshop。

/~jmf/Workshop_on_Computer_Vision_on_ 他的论文可以在CVPR 2008的会议集的DVD中找到。额外的关于GPU算法资源包括GPGPU网址和小组讨论/ 还有OpenVIDIA Web site,


C.3 PPT和讲稿



UW 455: Undergraduate Computer Vision,

/education/courses/455/ .

UW576: Graduate Computer Vision,

/education/courses/576 .

Stanford CS233B: Introduction to Computer Vision,

/teaching/cs223b/ .

MIT 6.869: Advances in Computer Vision,

/torralba/courses/6.869/ .

Berkeley CS 280: Computer Vision, /~trevor/

UNC COMP 776: Computer Vision, /~lazebnik/spring10 .

Middlebury CS 453: Computer Vision,

/~schar/courses/cs453-s10/ .

Related courses have also been taught on the topic of Computational Photography, e.g.,

CMU 15-463: Computational Photography, /courses/15-463/.

MIT 6.815/6.865: Advanced Computational Photography,


Stanford CS 448A: Computational photography on cell phones,

/courses/cs448a-10/ .

SIGGRAPH courses on Computational Photography,

/~raskar/photo/ .

这里还有一些最好的关于各种计算机视觉主题的在线讲稿,例如:belief propagation and graph

cuts,它们在UW-MSR Course of Vision Algo-rithms


C.4 参考文献:

这本的所有参考文献在这本书的网站上,一个几乎所有的计算机视觉的出版物都引用的更全面的部分注解书目由Keith Price维/Vision-Notes/bibliography/ .

这里还有一个可搜索的计算机图形学的参考书目/publications/bibliography/ 另外技术论文比较好的资源是Google

Scholar 和 CiteSeerX。

