概述
论文解读:On the Detection of Digital Face Manipulation(2020 CVPR)
一.创新点
- 制作了一个综合的fakeface数据集,包括0.8M真实人脸和1.8M由不同方法生成的伪人脸。
- 提出一种新颖的基于注意力的层,用于提高分类性能并产生指示被操纵的面部区域的注意力图。
- 提出一种新的度量,称为IINC(Inverse Intersection Non-containment ),用于评估注意力图,产生比现有度量更一致的评估。
二.注意力机制
1.特点
- 并行计算
- 考虑了序列的前后关系
- 参数共享
更多参考:综述—图像处理中的注意力机制_xys430381_1的专栏-CSDN博客_图像注意力机制
2.方法
- 空间域(在H*W方向上)(本文采用)
- 通道域(在Channel方向上)
- 混合域
三、论文结构
1.主干网络:Xception
Xception是一种主干网络(Backbone),在许多模型中放置在网络前端,用来提取特征图F。Xception包含三个主要模块:Entry flow, Middle flow 和 Exit flow。
- Entry flow:这一模块包括最开始的两层普通卷积和三个可分离卷积块,其中每个可分离卷积块又包括两次可分离卷积,且第一次可分离卷积需要对Channel数调整,第二次可分离卷积后需要残差块处理。
- Middle flow:这一模块为一个可分离卷积块重复八次,每个卷积快包括三次可分离卷积。
- Exit flow:这一模块包括一个可分离卷积块和后续收尾操作:2*可分离卷积,全局平均池化,全连接等。
- 注意力层:论文中提到:“We convert Xception-Net into our model by inserting the attention-based layer between Block 4 and Block 5 of the middle flow, and then fine-tune on DFFD training set.”即注意力层在Middle flow中第4个模块和第5个模块之间。
代码实现:
class SeparableConv2d(nn.Module): #可分离卷积(整合了depthwise和pointwose卷积)
def __init__(self, c_in, c_out, ks, stride=1, padding=0, dilation=1, bias=False):
super(SeparableConv2d, self).__init__()
self.c = nn.Conv2d(c_in, c_in, ks, stride, padding, dilation, groups=c_in, bias=bias)
self.pointwise = nn.Conv2d(c_in, c_out, 1, 1, 0, 1, 1, bias=bias)
def forward(self, x):
x = self.c(x)
x = self.pointwise(x)
return x
class Block(nn.Module): #基本块
def __init__(self, c_in, c_out, reps, stride=1, start_with_relu=True, grow_first=True):
"""
c_in:输入层channel数
c_out:输出层channel数
reps:一个Block中SeparableConv2d块的数量
stride:卷积步长
start_with_relu:是否为网络中的第一个Block
grow_first:是否为网络中的最后一个Block
"""
super(Block, self).__init__()
self.skip = None
self.skip_bn = None
if c_out != c_in or stride != 1: # 是否加入残差模块的条件判断
self.skip = nn.Conv2d(c_in, c_out, 1, stride=stride, bias=False)
self.skip_bn = nn.BatchNorm2d(c_out)
self.relu = nn.ReLU(inplace=True)
rep = []
c = c_in
# 如果是Block中的第一次卷积,就需要调整Channel数
if grow_first:
rep.append(self.relu)
rep.append(SeparableConv2d(c_in, c_out, 3, stride=1, padding=1, bias=False))
rep.append(nn.BatchNorm2d(c_out))
c = c_out
# 中间过程的卷积Channel数不变
for i in range(reps - 1):
rep.append(self.relu)
rep.append(SeparableConv2d(c, c, 3, stride=1, padding=1, bias=False))
rep.append(nn.BatchNorm2d(c))
# 只有Exit flow中卷积块的最后一次卷积,Channel数扩充
if not grow_first:
rep.append(self.relu)
rep.append(SeparableConv2d(c_in, c_out, 3, stride=1, padding=1, bias=False))
rep.append(nn.BatchNorm2d(c_out))
# 如果是第一个Block中的第一个卷积块,就需要去掉最开始的relu层,因为已经做过了
if not start_with_relu:
rep = rep[1:]
else:
rep[0] = nn.ReLU(inplace=False)
if stride != 1:
rep.append(nn.MaxPool2d(3, stride, 1))
self.rep = nn.Sequential(*rep)
def forward(self, inp):
x = self.rep(inp)
if self.skip is not None:
y = self.skip(inp)
y = self.skip_bn(y)
else:
y = inp
x += y
return x
# 主干网络(注意力层也夹在其中)
class Xception(nn.Module):
"""
Xception optimized for the ImageNet dataset, as specified in
https://arxiv.org/pdf/1610.02357.pdf
"""
def __init__(self, maptype, templates, num_classes=1000):
super(Xception, self).__init__()
self.num_classes = num_classes
# Entry flow
self.conv1 = nn.Conv2d(3, 32, 3, 2, 0, bias=False)
self.bn1 = nn.BatchNorm2d(32)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(32, 64, 3, bias=False)
self.bn2 = nn.BatchNorm2d(64)
self.block1 = Block(64, 128, 2, 2, start_with_relu=False, grow_first=True)
self.block2 = Block(128, 256, 2, 2, start_with_relu=True, grow_first=True)
self.block3 = Block(256, 728, 2, 2, start_with_relu=True, grow_first=True)
# Middle flow
self.block4 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block5 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block6 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block7 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block8 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block9 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block10 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block11 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
# Exit flow
self.block12 = Block(728, 1024, 2, 2, start_with_relu=True, grow_first=False)
self.conv3 = SeparableConv2d(1024, 1536, 3, 1, 1)
self.bn3 = nn.BatchNorm2d(1536)
self.conv4 = SeparableConv2d(1536, 2048, 3, 1, 1)
self.bn4 = nn.BatchNorm2d(2048)
self.last_linear = nn.Linear(2048, num_classes)
if maptype == 'none':
self.map = [1, None]
elif maptype == 'reg':
self.map = RegressionMap(728)
elif maptype == 'tmp':
self.map = TemplateMap(728, templates)
elif maptype == 'pca_tmp':
self.map = PCATemplateMap(728)
else:
print('Unknown map type: `{0}`'.format(maptype))
sys.exit()
def features(self, input):
# Entry flow
x = self.conv1(input)
x = self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
# Middle flow
x = self.block4(x)
x = self.block5(x)
x = self.block6(x)
x = self.block7(x)
# attention-based layer
mask, vec = self.map(x)
x = x * mask
x = self.block8(x)
x = self.block9(x)
x = self.block10(x)
x = self.block11(x)
# Exit flow
x = self.block12(x)
x = self.conv3(x)
x = self.bn3(x)
x = self.relu(x)
x = self.conv4(x)
x = self.bn4(x)
return x, mask, vec
def logits(self, features):
x = self.relu(features)
x = F.adaptive_avg_pool2d(x, (1, 1))
x = x.view(x.size(0), -1)
x = self.last_linear(x)
return x
def forward(self, input):
x, mask, vec = self.features(input)
x = self.logits(x)
return x, mask, vec
2.注意力层
论文中给出了两种计算注意力层的方法,用于相互比较:1. MAM Map 2.Reg.Map。
1. MAM Map
全称Manipulation Appearance Model,是作者自己提出的一个结构。该结构包括一个可分离卷积块和一个全连接,以此来估计各个区域的权重 α alpha α。此时, M a t t = M ‾ + A ⋅ α M_{att}=overline{M}+A·alpha Matt=M+A⋅α,其中 M ‾ overline{M} M和 A A A通过主成分分析(PCA)提取,将在后文介绍。
# MAM Map
class TemplateMap(nn.Module):
def __init__(self, c_in, templates):
super(TemplateMap, self).__init__()
self.c = Block(c_in, 364, 2, 2, start_with_relu=True, grow_first=False)
self.l = nn.Linear(364, 10)
self.relu = nn.ReLU(inplace=True)
self.templates = templates
def forward(self, x):
# 可分离卷积
v = self.c(x)
v = self.relu(v)
# 适应性平均池化
v = F.adaptive_avg_pool2d(v, (1, 1))
v = v.view(v.size(0), -1)
# 全连接
v = self.l(v)
mask = torch.mm(v, self.templates.reshape(10, 361))
mask = mask.reshape(x.shape[0], 1, 19, 19)
return mask, v
2.Reg. Map
这里采用Direct regression,通过一个可分离卷积和 S i g m o i d Sigmoid Sigmoid函数可以得到。
# Reg. Map
class RegressionMap(nn.Module):
def __init__(self, c_in):
super(RegressionMap, self).__init__()
self.c = SeparableConv2d(c_in, 1, 3, stride=1, padding=1, bias=False)
self.s = nn.Sigmoid()
def forward(self, x):
# 可分离卷积
mask = self.c(x)
# sigmoid
mask = self.s(mask)
return mask, None
3.PCA
在上文中,
M
a
t
t
=
M
‾
+
A
⋅
α
M_{att}=overline{M}+A·alpha
Matt=M+A⋅α,其中
M
‾
overline{M}
M和
A
A
A通过主成分分析(PCA)提取。其数据来源于FaceApp计算的100个真实的manipulation masks。其提取结果如下(10个主成分):
# PCA
class PCATemplateMap(nn.Module):
def __init__(self, templates):
super(PCATemplateMap, self).__init__()
self.templates = templates
def forward(self, x):
fe = x.view(x.shape[0], x.shape[1], x.shape[2] * x.shape[3])
fe = torch.transpose(fe, 1, 2)
# 均值
mu = torch.mean(fe, 2, keepdim=True)
fea_diff = fe - mu
cov_fea = torch.bmm(fea_diff, torch.transpose(fea_diff, 1, 2))
B = self.templates.reshape(1, 10, 361).repeat(x.shape[0], 1, 1)
D = torch.bmm(torch.bmm(B, cov_fea), torch.transpose(B, 1, 2))
# 特征值和特征向量
eigen_value, eigen_vector = D.symeig(eigenvectors=True)
index = torch.tensor([9]).cuda()
eigen = torch.index_select(eigen_vector, 2, index)
v = eigen.squeeze(-1)
mask = torch.mm(v, self.templates.reshape(10, 361))
mask = mask.reshape(x.shape[0], 1, 19, 19)
return mask, v
3.损失函数
样本共分为三种场景:supervised, weakly supervised 和 unsupervised。
但总的损失函数始终为:
L
=
L
c
l
a
s
s
i
f
i
e
r
+
λ
∗
L
m
a
p
frak L=frak L_{rm classifier} +lambda ast frak L_{rm map}
L=Lclassifier+λ∗Lmap
因此,损失函数也分三种情况讨论。
supervised: L m a p = ∣ ∣ M a t t − M g t ∣ ∣ quad frak L_{rm map}=|| rm M_{it att}-rm M_{it gt}|| Lmap=∣∣Matt−Mgt∣∣
M g t rm M_{it gt} Mgt是真实图像,用全0矩阵表示。对于完全伪造的图像,则用全1矩阵表示。
weakly supervised: L m a p = { ∣ S i g m o i d ( M a t t ) − 0 ∣ , i f r e a l ∣ m a x ( S i g m o i d ( M a t t ) ) − 0.75 ∣ i f f a k e quad frak L_{rm map}=begin{cases}|rm Sigmoid(M_{it att})-0|, & rm if , real \|rm max(Sigmoid(M_{it att}))-0.75| &rm if, fakeend{cases} Lmap={∣Sigmoid(Matt)−0∣,∣max(Sigmoid(Matt))−0.75∣ifrealiffake
对于真实图像,这种损失可以使注意力机制对其不产生激活。对于伪造图像,其损失会保持足够大。
unsupervised: λ m = 0 quad lambda_m=0 λm=0
此时总损失仅通过分类损失得到。
四、其他细节
1.数据集
论文构造了一个数据集,包括真实图像和虚假图像。
真实图像主要通过FFHQ和CelebA数据集获得。
虚假图像的来源有以下几种:
- 身份交换和表情修改:FaceBook++
- 属性操作:通过StarGAN训练FaceApp的图像得到
- 全脸合成:通过与预训练的PGGAN和StyleGAN得到
2.新的度量标准IINC
"IINC improves upon other metrics by measuring the non-overlap ratio of both maps, rather than their combined overlap, as in IoU. "即IINC相比于传统的IoU,除了考虑重叠的部分之外,还考虑了非重叠的部分。
I
I
N
C
=
1
3
−
∣
U
∣
∗
{
0
i
f
M
g
t
‾
=
0
a
n
d
M
a
t
t
‾
=
0
1
i
f
M
g
t
‾
=
0
x
o
r
M
g
t
‾
=
0
2
−
∣
I
∣
M
a
t
t
−
∣
I
∣
M
g
t
o
t
h
e
r
w
i
s
e
IINC=frac{1}{3-|rm U|}ast begin{cases} 0 &rm if ; overline{M_{it gt}}=0 ; and ; overline{M_{it att}}=0 \ 1 &rm if ; overline{M_{it gt}}=0 ; xor ; overline{M_{it gt}}=0 \ 2-frac{|rm I|}{rm M_{it att}}-frac{|rm I|}{rm M_{it gt}} &otherwise end{cases}
IINC=3−∣U∣1∗⎩⎪⎨⎪⎧012−Matt∣I∣−Mgt∣I∣ifMgt=0andMatt=0ifMgt=0xorMgt=0otherwise
其中,U和I的含义与IoU中的定义相同,即Union和Intersection。
五、论文链接
code
dataset
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
最后
以上就是感动楼房为你收集整理的论文解读:On the Detection of Digital Face Manipulation(2020 CVPR)论文解读:On the Detection of Digital Face Manipulation(2020 CVPR)的全部内容,希望文章能够帮你解决论文解读:On the Detection of Digital Face Manipulation(2020 CVPR)论文解读:On the Detection of Digital Face Manipulation(2020 CVPR)所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复