TensorRT5.1.5.0 实践 onnx-TensorRT的自定义oppytoch 转 onnx 过程中扩展自定义oponnx 转 tensorRT 过程中扩展自定义op最后测试两个pipeline是否契合

56 阅读 0 评论 37 点赞

我是靠谱客的博主羞涩纸飞机，最近开发中收集的这篇文章主要介绍TensorRT5.1.5.0 实践 onnx-TensorRT的自定义oppytoch 转 onnx 过程中扩展自定义oponnx 转 tensorRT 过程中扩展自定义op最后测试两个pipeline是否契合，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

文章目录

pytoch 转 onnx 过程中扩展自定义op
- 流程
- 细节学习
- - 自定义pytorch的op
  - 加入symbolic
onnx 转 tensorRT 过程中扩展自定义op
- 流程
- 细节学习
- - 对自定义op：InstanceNormalization的详解
  - - InstanceNormalizationPlugin.hpp
    - InstanceNormalizationPlugin.cpp
  - builtin_op_importers.cpp的理解
  - - 流程
    - 网络结构参数、weights和bias的读取
最后测试两个pipeline是否契合

当存在自定义op的时候，自定义op在pytorch2onnx，onnx2tensorRT两个过程中都应该是需要扩展的。

pytoch 转 onnx 过程中扩展自定义op

流程

例如，在这里，自定义一个叫做nonentity的op（但是实际功能就是全连接层，即Linear操作）

自定义一个pytorch的op，即对pytorch进行扩展。
在自定义的op的逻辑里面加入symbolic函数，使torch.onnx能够识别该自定义op。

细节学习

自定义pytorch的op

自定义一个pytorch的op，即对pytorch进行扩展。详情见Pytorch1.1.0 入门自定义op（python）

加入symbolic

即在自定义op的函数中加入symbolic（）函数，之后的整体自定义op函数如下所示。

class LinearFunction(Function):
# 这里的beta和alpha没有实际用处，只是证明使用自定义的op，在torch->onnx过程中，是可以传递网络参数的。
@staticmethod
def symbolic(g, self, mat1, mat2, beta, alpha):
#return g.op("nonentity", mat1, mat2, self, beta_f=beta, alpha_f=alpha)
return g.op("nonentity", self,mat1, mat2,
beta_f=beta, alpha_f=alpha)
@staticmethod
def forward(ctx,input,weight,bias=None,beta_f=1.0,alpha_f=1.0):
ctx.save_for_backward(input,weight,bias)
ctx.beta=beta_f
ctx.alpha=alpha_f
output=input.mm(weight.t())
if bias is not None:
output+=bias.unsqueeze(0).expand_as(output)
return output
@staticmethod
def backward(ctx,grad_output):
input,weight,bias=ctx.saved_variables
grad_input=grad_weight=grad_bias=None
if ctx.needs_input_grad[0]:
grad_input=grad_output.mm(weight)
if ctx.needs_input_grad[1]:
grad_weight=grad_output.t().mm(input)
if bias is not None and ctx.needs_input_grad[2]:
grad_bias=grad_output.sum(0).squeeze(0)
return grad_input,grad_weight,grad_bias,None,None

symbolic可以认为规定了，pytorch->onnx这个过程中的输出规范。
这里参考这里torch.onnx
网址( https://segmentfault.com/p/1210000018097701/read )
简单的来说我们就是在自己创造，onnx非标准化的非ATen操作符（op），我的代码中对应的symbolic是这样的


def symbolic(g, self, mat1, mat2, beta, alpha):
return g.op("nonentity", self,mat1, mat2,
beta_f=beta, alpha_f=alpha)

对应的输出的onnx结构的部分也就是如下的

...
%19 : Float(64, 64, 3, 3) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%18), scope: Net_LinearFunction/Sequential[conv3]/MaxPool2d[2]
%20 : Float(64, 576) = onnx::Flatten[axis=1](%19), scope: Net_LinearFunction
%21 : Float(64, 128) = onnx::nonentity[alpha=1.3, beta=1.2](%20, %dense.0.weight, %dense.0.bias), scope: Net_LinearFunction/Sequential[dense]/Linear[0]
%22 : Float(64, 128) = onnx::Relu(%21), scope: Net_LinearFunction/Sequential[dense]/ReLU[1]
%23 : Float(64, 10) = onnx::nonentity[alpha=1.33, beta=1.22](%22, %dense.2.weight, %dense.2.bias), scope: Net_LinearFunction/Sequential[dense]/Linear[2]
return (%23)

%21和%23都是用我自定义的op,“nonentity”来执行运算的，“[]”中代表的是网络参数，"()"中代表的权重

onnx 转 tensorRT 过程中扩展自定义op

流程

例如，在这里，自定义一个叫做nonentity的op

下载官网源码onnx-tensorrt
参考InstanceNormalization.cpp/.h ,写好自己nonentity.hpp和nonentity.cpp的实现。(同样可以参考FancyActivation,ResizeNearest等，都是官方写好的自定义op的示例，是op的逻辑)
在builtin_op_importers.cpp中使用DEFINE_BUILTIN_OP_IMPORTER添加对自己注册Op的使用。
在CMakeLists.txt中，set(IMPORTER_SOURCES... 下面将自己的nonentity.cpp加进去。
按照教程，重新编译自己的onnx-tensorRT

然后拿自己输出的onnx文件测试，其实自己的nonentity层就可以正常被读取了，输出如下所示：

boyun@boyun-MS-7B90:~/workspace/onnx-tensorrt-master$ onnx2trt ./onnx/customer_op_FC.onnx -v
----------------------------------------------------------------
Input filename:
./onnx/customer_op_FC.onnx
ONNX IR version:
0.0.4
Opset version:
9
Producer name:
pytorch
Producer version: 1.1
Domain:
Model version:
0
Doc string:
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
[2019-08-13 06:15:48
INFO] 11:Conv -> (32, 28, 28)
[2019-08-13 06:15:48
INFO] 12:Relu -> (32, 28, 28)
[2019-08-13 06:15:48
INFO] 13:MaxPool -> (32, 14, 14)
[2019-08-13 06:15:48
INFO] 14:Conv -> (64, 14, 14)
[2019-08-13 06:15:48
INFO] 15:Relu -> (64, 14, 14)
[2019-08-13 06:15:48
INFO] 16:MaxPool -> (64, 7, 7)
[2019-08-13 06:15:48
INFO] 17:Conv -> (64, 7, 7)
[2019-08-13 06:15:48
INFO] 18:Relu -> (64, 7, 7)
[2019-08-13 06:15:48
INFO] 19:MaxPool -> (64, 3, 3)
[2019-08-13 06:15:48
INFO] 20:Flatten -> (576)
[2019-08-13 06:15:48
INFO] 21:nonentity -> (576)
[2019-08-13 06:15:48
INFO] 22:Relu -> (576)
[2019-08-13 06:15:48
INFO] 23:nonentity -> (576)
All done

因为我把nonentity的内部逻辑写成了Normalize，所以后来维度就不变了，这也说明层逻辑也被读取了。

细节学习

onnx-TensorRT的自定义op写法，用的是IPluginV2(NvInfer.h)。

对自定义op：InstanceNormalization的详解

InstanceNormalizationPlugin.hpp

包含三个部分：namespace，InstanceNormalizationPlugin ， InstanceNormalizationPluginCreator。

namespace
主要是定义了该pluign的版本和该plugin的名称。
InstanceNormalizationPlugin
是对onnx2trt::PluginV2的继承。
而其实onnx2trt::PluginV2是对NvInfer.h中的nvinfer1::IPluginV2的继承。
对于IPluginV2的结构介绍详见这篇博文IPluginV2.
InstanceNormalizationPluginCreator
是对NvInfer.h中的nvinfer1::IPluginCreator的继承，这个基类是配合PluginV2Ext类来实现自定义op（层）注册并使用的。介绍见IPluginCreator

这里可以看到“2”步骤提出的是继承nvinfer1::IPluginV2,但是“3”步骤提出的是配合nvinfer1::IPluginV2Ext。这是因为5.1.x.x相比5.0.x.x更新了几个新方法，写在IPluginV2Ext中，IPluginVExt继承IPluginV2，官方支持使用最新版本的 IPluginV2Ext.
IPluginCreator的各个函数的实现方法，在不同的自定义op中，写法基本一样，只需要在getPluginName和getPluginVersion的时候return对应参数即可。

！！！建议上面的onnx2trt::PluginV2可以考虑继承IPluginV2Ext，也就是说官方这onnx-tensorrt中的写法已经有些落后了。

InstanceNormalizationPlugin.cpp

就是对头文件的实现，和caffe的自定义层逻辑大同小异。
同样具有两个构造函数分别负责build和runtime阶段，其余函数就不多说了，各司其职。
想理解一下核心的enqueue()

int InstanceNormalizationPlugin::enqueue(int batchSize,
const void *const *inputs, void **outputs,
void *workspace, cudaStream_t stream) {
assert(_initialized);
nvinfer1::Dims input_dims = this->getInputDims(0);
int n = batchSize;
int c = input_dims.d[0];
int h = input_dims.d[1];
int w = input_dims.d[2];
CHECK_CUDNN(cudnnSetTensor4dDescriptor(_b_desc, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, 1, n*c, 1, 1));
cudnnDataType_t cudnn_dtype;
CHECK_CUDNN(convert_trt2cudnn_dtype(this->getDataType(), &cudnn_dtype));
CHECK_CUDNN(cudnnSetTensor4dDescriptor(_x_desc, CUDNN_TENSOR_NCHW, cudnn_dtype, 1, n*c, h, w));
CHECK_CUDNN(cudnnSetTensor4dDescriptor(_y_desc, CUDNN_TENSOR_NCHW, cudnn_dtype, 1, n*c, h, w));
float alpha = 1;
float beta
= 0;
void const* x_ptr = inputs[0];
void*
y_ptr = outputs[0];
CHECK_CUDNN(cudnnSetStream(_cudnn_handle, stream));
// Note: Use of CUDNN_BATCHNORM_SPATIAL_PERSISTENT can cause numerical
//
overflows (NaNs) for fp32 data in some circumstances. The lower-
//
performance CUDNN_BATCHNORM_SPATIAL should be used if this is not
//
acceptable.
CHECK_CUDNN(
cudnnBatchNormalizationForwardTraining(
_cudnn_handle, CUDNN_BATCHNORM_SPATIAL_PERSISTENT, &alpha, &beta,
_x_desc, x_ptr, _y_desc, y_ptr, _b_desc, _d_scale, _d_bias,
1., nullptr, nullptr, _epsilon, nullptr, nullptr));
return 0;
}

CHECK_CUDNN的必要性
因为cudnn的每个函数，都会返回类型为cudnnStatus_t的错误码。成功执行的话，返回CUDNN_STATUS_SUCCESS。其他错误的话，可以用cudnnGetErrorString(status)来获得具体的错误信息。
而CHECK_CUDNN就是检测错误码用的。
cudnnSetTensor4dDescriptor()
用来构造cudnn可用的输入或输出描述。
对于卷积计算来说，主要有三个参数，输入输出和权重。构造输入输出的描述的三个步骤如下：

cudnnTensorDescriptor_t input_descriptor;
cudnnCreateTensorDescriptor(&input_descriptor);
cudnnSetTensor4dDescriptor(input_descriptor,
/*format=*/CUDNN_TENSOR_NHWC,
/*dataType=*/CUDNN_DATA_FLOAT,
/*batch_size=*/1,
/*channels=*/3,
/*image_height=*/image.rows,
/*image_width=*/image.cols);

即创造一个tensorDescriptor，然后再给它设置属性。
在这个InstanceNormalization的自定义op中
（1）类构建的时候创建了cudnnTensorDescriptor_t类型的_x_desc, _y_desc, _b_desc.
（2）cpp的initialize()中做了udnnCreateTensorDescriptor
（3）cpp的enqueue()中做了cudnnSetTensor4dDescriptor
这样，需要执行计算的tensor就准备完毕了。

convert_trt2cudnn_dtype()
进行精度选择：half or float
cudnnSetStream()

cudnnStatus_t cudnnSetStream(cudnnHandle_t handle, cudaStream_t streamId)

此函数在cuDNN句柄中设置用户的CUDA流。当在内部流中启动cuDNN内核时，新流将用于启动cuDNN GPU内核或同步到此流。如果未设置cuDNN库流，则所有内核都使用默认（NULL）流。在cuDNN句柄中设置用户流可确保在同一流中启动cuDNN调用和其他GPU内核的问题顺序执行。
handle：指向cuDNN handle的指针
streamID：新的CUDA流

cudnnBatchNormalizationForwardTraining()
此函数执行训练阶段的前向BatchNormalization层计算。
好吧，看到这步有点失望。。。那就说明又没有源代码…
想知道具体这个函数的作用：cuDNN开发手册

所以，整个InstanceNormalization的自定义逻辑就综上所述。

builtin_op_importers.cpp的理解

这里可以看一下这个文章，大佬写的很清楚Onnx-tensorrt详解之nvonnxparser库.

流程

将onnx输入数据转化为trt要求的数据格式
建立trt层
计算trt输出结果

打开这个cpp，可以看到了所有的op的逻辑调用，以DEFINE_BUILTIN_OP_IMPORTER(Conv) 为例，也就是说，当检测到onnx的Conv操作时，Conv操作的处理过程为：

//************将onnx输入数据转化为trt要求的数据格式*******************
nvinfer1::ITensor* tensor_ptr = &inputs.at(0).tensor();
auto kernel_weights = inputs.at(1).weights();
//onnxmodel的输入格式 inputs=['x','W'],转化为trtmodel输出的数据格式
int noutput = kernel_weights.shape.d[0];
//*************************建立trt层*********************
nvinfer1::IConvolutionLayer* layer = ctx->network()->addConvolution(*tensor_ptr, noutput, kernel_size, kernel_weights, bias_weights);
//此时，onnx的layer已经转化为trtmodel的layer,
ctx（context的简写）就是trtmodel的network。
trt官方文档给出的添加convolution层的例子：IConvolutionLayer* conv1 = network->addConvolution(*scale_1->getOutput(0), 20, DimsHW{5, 5}, mWeightMap["conv1filter"], mWeightMap["conv1bias"]);
//***********************计算trt输出结果********************
tensor_ptr = layer->getOutput(0); //利用trtmodel计算输出输出tensor，并作为输出返回
return {{tensor_ptr}}; //返回输出tensor
Y

所以，创造自定义op的时候，在这里添加读取逻辑，是必要的。

网络结构参数、weights和bias的读取

onnx文件是可以可视化的，例如下面一个有自定义层mnist网络是这样的：


%11 : Float(64, 32, 28, 28) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%0, %conv1.0.weight, %conv1.0.bias), scope: Net_LinearFunction/Sequential[conv1]/Conv2d[0]
%12 : Float(64, 32, 28, 28) = onnx::Relu(%11), scope: Net_LinearFunction/Sequential[conv1]/ReLU[1]
%13 : Float(64, 32, 14, 14) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%12), scope: Net_LinearFunction/Sequential[conv1]/MaxPool2d[2]
%14 : Float(64, 64, 14, 14) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%13, %conv2.0.weight, %conv2.0.bias), scope: Net_LinearFunction/Sequential[conv2]/Conv2d[0]
%15 : Float(64, 64, 14, 14) = onnx::Relu(%14), scope: Net_LinearFunction/Sequential[conv2]/ReLU[1]
%16 : Float(64, 64, 7, 7) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%15), scope: Net_LinearFunction/Sequential[conv2]/MaxPool2d[2]
%17 : Float(64, 64, 7, 7) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%16, %conv3.0.weight, %conv3.0.bias), scope: Net_LinearFunction/Sequential[conv3]/Conv2d[0]
%18 : Float(64, 64, 7, 7) = onnx::Relu(%17), scope: Net_LinearFunction/Sequential[conv3]/ReLU[1]
%19 : Float(64, 64, 3, 3) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%18), scope: Net_LinearFunction/Sequential[conv3]/MaxPool2d[2]
%20 : Float(64, 576) = onnx::Flatten[axis=1](%19), scope: Net_LinearFunction
%21 : Float(64, 128) = onnx::nonentity(%20, %dense.0.weight, %dense.0.bias), scope: Net_LinearFunction/Sequential[dense]/Linear[0]
%22 : Float(64, 128) = onnx::Relu(%21), scope: Net_LinearFunction/Sequential[dense]/ReLU[1]
%23 : Float(64, 10) = onnx::nonentity(%22, %dense.2.weight, %dense.2.bias), scope: Net_LinearFunction/Sequential[dense]/Linear[2]
return (%23)

如图所示，每一行op名称后跟着的“[]”里面的是网络结构参数，“()”里面的是代表上一层的“%n”,“weights”和“bias”。
所以在TensorRT中读onnx的时候，是依照一以下逻辑读数据的。

网络结构参数
经过对比发现，Conv操作的网络结构参数是用get_kernel_params方法来读取的。


get_kernel_params(node, get_DimsHW_from_CHW(dims), &kernel_size,
&strides, &beg_padding, &end_padding, paddingMode, &dilations);

weights和bias
而Conv操作的weights和bias则是通过inputs.at来读取的。（ input.at[0]是代表上一层的一个数据结构）


ASSERT(inputs.at(0).is_tensor(),
ErrorCode::kUNSUPPORTED_NODE);
ASSERT(inputs.at(1).is_weights(), ErrorCode::kUNSUPPORTED_NODE);

最后测试两个pipeline是否契合

用onnx2trt命令直接测试，结果如下

boyun@boyun-MS-7B90:~/workspace/onnx-tensorrt-master$ onnx2trt ./onnx/customer_op_FC.onnx
-v
----------------------------------------------------------------
Input filename:
./onnx/customer_op_FC.onnx
ONNX IR version:
0.0.4
Opset version:
9
Producer name:
pytorch
Producer version: 1.1
Domain:
Model version:
0
Doc string:
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
[2019-08-14 07:30:02
INFO] 11:Conv -> (32, 28, 28)
[2019-08-14 07:30:02
INFO] 12:Relu -> (32, 28, 28)
[2019-08-14 07:30:02
INFO] 13:MaxPool -> (32, 14, 14)
[2019-08-14 07:30:02
INFO] 14:Conv -> (64, 14, 14)
[2019-08-14 07:30:02
INFO] 15:Relu -> (64, 14, 14)
[2019-08-14 07:30:02
INFO] 16:MaxPool -> (64, 7, 7)
[2019-08-14 07:30:02
INFO] 17:Conv -> (64, 7, 7)
[2019-08-14 07:30:02
INFO] 18:Relu -> (64, 7, 7)
[2019-08-14 07:30:02
INFO] 19:MaxPool -> (64, 3, 3)
[2019-08-14 07:30:02
INFO] 20:Flatten -> (576)
[2019-08-14 07:30:02
INFO] 21:nonentity -> (576)
[2019-08-14 07:30:02
INFO] 22:Relu -> (576)
[2019-08-14 07:30:02
INFO] 23:nonentity -> (576)
All done

最后

以上就是羞涩纸飞机为你收集整理的TensorRT5.1.5.0 实践 onnx-TensorRT的自定义oppytoch 转 onnx 过程中扩展自定义oponnx 转 tensorRT 过程中扩展自定义op最后测试两个pipeline是否契合的全部内容，希望文章能够帮你解决TensorRT5.1.5.0 实践 onnx-TensorRT的自定义oppytoch 转 onnx 过程中扩展自定义oponnx 转 tensorRT 过程中扩展自定义op最后测试两个pipeline是否契合所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：TensorRT
浏览次数：56 次浏览
发布日期：2023-09-14 02:20:56
本文链接：https://www.kaopuke.com/article/k-p-k_14_uzo_6_f3_13__23_g4.html

TensorRT5.1.5.0 实践 onnx-TensorRT的自定义oppytoch 转 onnx 过程中扩展自定义oponnx 转 tensorRT 过程中扩展自定义op最后测试两个pipeline是否契合

概述

文章目录

pytoch 转 onnx 过程中扩展自定义op

流程

细节学习

自定义pytorch的op

加入symbolic

onnx 转 tensorRT 过程中扩展自定义op

流程

细节学习

对自定义op：InstanceNormalization的详解

InstanceNormalizationPlugin.hpp

InstanceNormalizationPlugin.cpp

builtin_op_importers.cpp的理解

流程

网络结构参数、weights和bias的读取

最后测试两个pipeline是否契合

最后

评论列表共有 0 条评论

发表评论取消回复

TensorRT5.1.5.0 实践 onnx-TensorRT的自定义oppytoch 转 onnx 过程中扩展自定义oponnx 转 tensorRT 过程中扩展自定义op最后测试两个pipeline是否契合

概述

文章目录

pytoch 转 onnx 过程中扩展自定义op

流程

细节学习

自定义pytorch的op

加入symbolic

onnx 转 tensorRT 过程中扩展自定义op

流程

细节学习

对自定义op：InstanceNormalization的详解

InstanceNormalizationPlugin.hpp

InstanceNormalizationPlugin.cpp

builtin_op_importers.cpp的理解

流程

网络结构参数、weights和bias的读取

最后测试两个pipeline是否契合

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复