论文解读：On the Detection of Digital Face Manipulation（2020 CVPR)论文解读：On the Detection of Digital Face Manipulation（2020 CVPR)

109 阅读 0 评论 72 点赞

我是靠谱客的博主感动楼房，最近开发中收集的这篇文章主要介绍论文解读：On the Detection of Digital Face Manipulation（2020 CVPR)论文解读：On the Detection of Digital Face Manipulation（2020 CVPR)，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

论文解读：On the Detection of Digital Face Manipulation（2020 CVPR)

一.创新点

制作了一个综合的fakeface数据集，包括0.8M真实人脸和1.8M由不同方法生成的伪人脸。
提出一种新颖的基于注意力的层，用于提高分类性能并产生指示被操纵的面部区域的注意力图。
提出一种新的度量，称为IINC(Inverse Intersection Non-containment )，用于评估注意力图，产生比现有度量更一致的评估。

二.注意力机制

1.特点

并行计算
考虑了序列的前后关系
参数共享

更多参考：综述—图像处理中的注意力机制_xys430381_1的专栏-CSDN博客_图像注意力机制

2.方法

空间域（在H*W方向上）（本文采用）
通道域（在Channel方向上）
混合域

三、论文结构

网络结构

1.主干网络：Xception

Xception是一种主干网络(Backbone)，在许多模型中放置在网络前端，用来提取特征图F。Xception包含三个主要模块：Entry flow, Middle flow 和 Exit flow。

Entry flow：这一模块包括最开始的两层普通卷积和三个可分离卷积块，其中每个可分离卷积块又包括两次可分离卷积，且第一次可分离卷积需要对Channel数调整，第二次可分离卷积后需要残差块处理。
Middle flow：这一模块为一个可分离卷积块重复八次，每个卷积快包括三次可分离卷积。
Exit flow：这一模块包括一个可分离卷积块和后续收尾操作：2*可分离卷积，全局平均池化，全连接等。
注意力层：论文中提到：“We convert Xception-Net into our model by inserting the attention-based layer between Block 4 and Block 5 of the middle flow, and then fine-tune on DFFD training set.”即注意力层在Middle flow中第4个模块和第5个模块之间。

代码实现：

class SeparableConv2d(nn.Module):		#可分离卷积(整合了depthwise和pointwose卷积)
    def __init__(self, c_in, c_out, ks, stride=1, padding=0, dilation=1, bias=False):
        super(SeparableConv2d, self).__init__()
        self.c = nn.Conv2d(c_in, c_in, ks, stride, padding, dilation, groups=c_in, bias=bias)
        self.pointwise = nn.Conv2d(c_in, c_out, 1, 1, 0, 1, 1, bias=bias)

    def forward(self, x):
        x = self.c(x)
        x = self.pointwise(x)
        return x
    
class Block(nn.Module):				#基本块
    def __init__(self, c_in, c_out, reps, stride=1, start_with_relu=True, grow_first=True):
        """
        c_in:输入层channel数
        c_out:输出层channel数
        reps:一个Block中SeparableConv2d块的数量
        stride:卷积步长
        start_with_relu:是否为网络中的第一个Block
        grow_first：是否为网络中的最后一个Block
        """
        super(Block, self).__init__()

        self.skip = None
        self.skip_bn = None
        if c_out != c_in or stride != 1:	# 是否加入残差模块的条件判断
            self.skip = nn.Conv2d(c_in, c_out, 1, stride=stride, bias=False)
            self.skip_bn = nn.BatchNorm2d(c_out)

        self.relu = nn.ReLU(inplace=True)

        rep = []
        c = c_in
        # 如果是Block中的第一次卷积，就需要调整Channel数
        if grow_first:
            rep.append(self.relu)
            rep.append(SeparableConv2d(c_in, c_out, 3, stride=1, padding=1, bias=False))
            rep.append(nn.BatchNorm2d(c_out))
            c = c_out

        # 中间过程的卷积Channel数不变
        for i in range(reps - 1):
            rep.append(self.relu)
            rep.append(SeparableConv2d(c, c, 3, stride=1, padding=1, bias=False))
            rep.append(nn.BatchNorm2d(c))
		# 只有Exit flow中卷积块的最后一次卷积，Channel数扩充
        if not grow_first:
            rep.append(self.relu)
            rep.append(SeparableConv2d(c_in, c_out, 3, stride=1, padding=1, bias=False))
            rep.append(nn.BatchNorm2d(c_out))

        # 如果是第一个Block中的第一个卷积块,就需要去掉最开始的relu层，因为已经做过了
        if not start_with_relu:
            rep = rep[1:]
        else:
            rep[0] = nn.ReLU(inplace=False)

        if stride != 1:
            rep.append(nn.MaxPool2d(3, stride, 1))
        self.rep = nn.Sequential(*rep)

    def forward(self, inp):
        x = self.rep(inp)

        if self.skip is not None:
            y = self.skip(inp)
            y = self.skip_bn(y)
        else:
            y = inp

        x += y
        return x
    
# 主干网络(注意力层也夹在其中)
class Xception(nn.Module):
    """
  Xception optimized for the ImageNet dataset, as specified in
  https://arxiv.org/pdf/1610.02357.pdf
  """

    def __init__(self, maptype, templates, num_classes=1000):
        super(Xception, self).__init__()
        self.num_classes = num_classes
        
        
		# Entry flow
        self.conv1 = nn.Conv2d(3, 32, 3, 2, 0, bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.relu = nn.ReLU(inplace=True)

        self.conv2 = nn.Conv2d(32, 64, 3, bias=False)
        self.bn2 = nn.BatchNorm2d(64)

        self.block1 = Block(64, 128, 2, 2, start_with_relu=False, grow_first=True)
        self.block2 = Block(128, 256, 2, 2, start_with_relu=True, grow_first=True)
        self.block3 = Block(256, 728, 2, 2, start_with_relu=True, grow_first=True)
        
        # Middle flow
        self.block4 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        self.block5 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        self.block6 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        self.block7 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        self.block8 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        self.block9 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        self.block10 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        self.block11 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
        
        # Exit flow
        self.block12 = Block(728, 1024, 2, 2, start_with_relu=True, grow_first=False)

        self.conv3 = SeparableConv2d(1024, 1536, 3, 1, 1)
        self.bn3 = nn.BatchNorm2d(1536)

        self.conv4 = SeparableConv2d(1536, 2048, 3, 1, 1)
        self.bn4 = nn.BatchNorm2d(2048)

        self.last_linear = nn.Linear(2048, num_classes)

        if maptype == 'none':
            self.map = [1, None]
        elif maptype == 'reg':
            self.map = RegressionMap(728)
        elif maptype == 'tmp':
            self.map = TemplateMap(728, templates)
        elif maptype == 'pca_tmp':
            self.map = PCATemplateMap(728)
        else:
            print('Unknown map type: `{0}`'.format(maptype))
            sys.exit()

    def features(self, input):
        # Entry flow
        x = self.conv1(input)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        
        # Middle flow
        x = self.block4(x)
        x = self.block5(x)
        x = self.block6(x)
        x = self.block7(x)
        # attention-based layer
        mask, vec = self.map(x)
        x = x * mask
        x = self.block8(x)
        x = self.block9(x)
        x = self.block10(x)
        x = self.block11(x)
        
        # Exit flow
        x = self.block12(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.relu(x)

        x = self.conv4(x)
        x = self.bn4(x)
        return x, mask, vec

    def logits(self, features):
        x = self.relu(features)
        x = F.adaptive_avg_pool2d(x, (1, 1))
        x = x.view(x.size(0), -1)
        x = self.last_linear(x)
        return x

    def forward(self, input):
        x, mask, vec = self.features(input)
        x = self.logits(x)
        return x, mask, vec

2.注意力层

论文结构
论文中给出了两种计算注意力层的方法，用于相互比较：1. MAM Map 2.Reg.Map。

1. MAM Map

全称Manipulation Appearance Model，是作者自己提出的一个结构。该结构包括一个可分离卷积块和一个全连接，以此来估计各个区域的权重 $α$ 。此时， $M_{att}=overline{M}+A·alpha$ ，其中 $\overline{M}$ 和 $A$ 通过主成分分析(PCA)提取，将在后文介绍。

# MAM Map
class TemplateMap(nn.Module):
    def __init__(self, c_in, templates):
        super(TemplateMap, self).__init__()
        self.c = Block(c_in, 364, 2, 2, start_with_relu=True, grow_first=False)
        self.l = nn.Linear(364, 10)
        self.relu = nn.ReLU(inplace=True)

        self.templates = templates

    def forward(self, x):
        # 可分离卷积
        v = self.c(x)
        v = self.relu(v)
        # 适应性平均池化
        v = F.adaptive_avg_pool2d(v, (1, 1))
        v = v.view(v.size(0), -1)
        # 全连接
        v = self.l(v)
        mask = torch.mm(v, self.templates.reshape(10, 361))
        mask = mask.reshape(x.shape[0], 1, 19, 19)

        return mask, v

2.Reg. Map

这里采用Direct regression，通过一个可分离卷积和 $S i g m o i d$ 函数可以得到。

# Reg. Map
class RegressionMap(nn.Module):
    def __init__(self, c_in):
        super(RegressionMap, self).__init__()
        self.c = SeparableConv2d(c_in, 1, 3, stride=1, padding=1, bias=False)
        self.s = nn.Sigmoid()

    def forward(self, x):
        # 可分离卷积
        mask = self.c(x)
        # sigmoid
        mask = self.s(mask)
        return mask, None

3.PCA

在上文中， $M_{att}=overline{M}+A·alpha$ ，其中 $\overline{M}$ 和 $A$ 通过主成分分析(PCA)提取。其数据来源于FaceApp计算的100个真实的manipulation masks。其提取结果如下(10个主成分)：
PCA结果

# PCA
class PCATemplateMap(nn.Module):
    def __init__(self, templates):
        super(PCATemplateMap, self).__init__()
        self.templates = templates

    def forward(self, x):
        fe = x.view(x.shape[0], x.shape[1], x.shape[2] * x.shape[3])
        fe = torch.transpose(fe, 1, 2)
        # 均值
        mu = torch.mean(fe, 2, keepdim=True)
        fea_diff = fe - mu

        cov_fea = torch.bmm(fea_diff, torch.transpose(fea_diff, 1, 2))
        B = self.templates.reshape(1, 10, 361).repeat(x.shape[0], 1, 1)
        D = torch.bmm(torch.bmm(B, cov_fea), torch.transpose(B, 1, 2))
        # 特征值和特征向量
        eigen_value, eigen_vector = D.symeig(eigenvectors=True)
        index = torch.tensor([9]).cuda()
        eigen = torch.index_select(eigen_vector, 2, index)

        v = eigen.squeeze(-1)
        mask = torch.mm(v, self.templates.reshape(10, 361))
        mask = mask.reshape(x.shape[0], 1, 19, 19)
        return mask, v

3.损失函数

样本共分为三种场景：supervised, weakly supervised 和 unsupervised。

但总的损失函数始终为： $L_{rm classifier} +lambda ast frak L_{rm map}$
因此，损失函数也分三种情况讨论。

supervised: $L_{rm map}=|| rm M_{it att}-rm M_{it gt}||$

$M_{it gt}$ 是真实图像，用全0矩阵表示。对于完全伪造的图像，则用全1矩阵表示。

weakly supervised: $L_{rm map}=begin{cases}|rm Sigmoid(M_{it att})-0|, & rm if , real \|rm max(Sigmoid(M_{it att}))-0.75| &rm if, fakeend{cases}$

对于真实图像，这种损失可以使注意力机制对其不产生激活。对于伪造图像，其损失会保持足够大。

unsupervised： $lambda_m=0$

此时总损失仅通过分类损失得到。

四、其他细节

1.数据集

论文构造了一个数据集，包括真实图像和虚假图像。

真实图像主要通过FFHQ和CelebA数据集获得。

虚假图像的来源有以下几种：

身份交换和表情修改：FaceBook++
属性操作：通过StarGAN训练FaceApp的图像得到
全脸合成：通过与预训练的PGGAN和StyleGAN得到

2.新的度量标准IINC

"IINC improves upon other metrics by measuring the non-overlap ratio of both maps, rather than their combined overlap, as in IoU. "即IINC相比于传统的IoU,除了考虑重叠的部分之外，还考虑了非重叠的部分。
$overline{M_{it gt}}=0 ; and ; overline{M_{it att}}=0 \ 1 &rm if ; overline{M_{it gt}}=0 ; xor ; overline{M_{it gt}}=0 \ 2-frac{|rm I|}{rm M_{it att}}-frac{|rm I|}{rm M_{it gt}} &otherwise end{cases}$
其中，U和I的含义与IoU中的定义相同，即Union和Intersection。
IINC与其他度量的对比