Introdution
Key Challenges
- the face appearance space is extremely high dimensional
- we generally have access to only a sparse sampling of this space
- the mapping of each image to pose, expression, and other parameters is not generally known a priori
Key Issues
- define the edges weights in th graph
- create a compelling, stabilized output sequence
Method
Automatic alignment and pose estimation
- Run a face detector
- Apply a fiducial points detector
- Use a pre-labeled 3D template model to estimate pose and warp the image to a frontal view for a more consistent computation of similarity
The face graph
Distance between faces
Local Binary Pattern(LBP) Hisrograms
Divide an image to gird of cells
Convert each pixel in a cell into a code which encodes the relative brightnss patterns in a square neighborhood around the pixel
Each neghibor is assigned a 1 or 0 if it is brighter or darker than the center pixel
The pattern of 1’s and 0’s defines a per pixel binary code
The per cell histogram of these codes defines the descriptors a cell
Calculate a separate set of descriptors for the eyes, mouth, and hair regions
A descriptor for a region is a concatenation of participating cells’ descriptors
The distance bettwen two face images $i,j$, denoted $d_{ij}$, is defiend by $\chi^2$-distance between the corresponding descriptors, and then normalized by a logistic function $L(d)=(1+e^{-\gamma(d-\mu)/\sigma})^{-1}$(s.t. $\gamma=ln(99)$)
Appearance distance function
$$
D_{appear}(i,j)= 1 - (1-\lambda^md_{ij}^m)(1-\lambda^ed_{ij}^e)(1-\lambda^hd_{ij}^h)
$$
$d^{m,e,h}$: the LBP histogram distances restricted to the mouth, eyes, and hair regions, respectively
$\lambda^{m,e,h}$: the corresponding weights for there regions, fixed $\lambda^,=0.8,\lambda^e=\lambda^h=0.1$ in this paper
Difference in pose(yaw and pitch) and time (when timestamps are avaliable) $D_{yaw}, D_{pitch}, D_{time}$is measured by $L_2$ followed by a logistic function $L(d)$
The face graph
Each edge $(i,j)$ with weight $D(i,j)$ as
$$
D(i,j)=[1-\prod_{s\in app, yaw,pitch,time}(1-D_s(i,j))]^\alpha
$$
$\alpha$: nonlinearly scale the distances
Two ways of finding paths
- Look for the shortest paths using Dijkstra’s algorithm
- Produce a smooth path of arbitrary length by taking walks on the graph
The Cross dissolve
$$
I_{out}(t)=(1-t)i_{in_1}+tI_{in_2}
$$
$I_{in_1}, I_{in_2}$: the input images
$I_{out}$: the output sequences