SAM得到完美匹配（perfect match）

83 阅读 0 评论 55 点赞

我是靠谱客的博主机智手套，最近开发中收集的这篇文章主要介绍SAM得到完美匹配（perfect match），觉得挺不错的，现在分享给大家，希望可以做个参考。

主要针对bwa生成的sam文件中，如何找到完美匹配的比对结果。

完美匹配（perfect match）是指：一条序列能够在不允许编辑距离（edit distance），碱基错配（mismatch）, GAP opens/extentions时能够比对到参考基因组上。

BWA比对的结果最终为sam（Sequence Alignment/Map）格式，内容如下：

Col	Field	Description
1	QNAME	Query (pair) NAME
2	FLAG	bitwise FLAG
3	RNAME	Reference sequence NAME
4	POS	1-based leftmost POSition/coordinate of clipped sequence
5	MAPQ	MAPping Quality (Phred-scaled)
6	CIAGR	extended CIGAR string
7	MRNM	Mate Reference sequence NaMe (‘=’ if same as RNAME)
8	MPOS	1-based Mate POSistion
9	ISIZE	Inferred insert SIZE
10	SEQ	query SEQuence on the same strand as the reference
11	QUAL	query QUALity (ASCII-33 gives the Phred base quality)
12	OPT	variable OPTional fields in the format TAG:VTYPE:VALUE

CIGAR值能够指示部分比对情况，但是但从CIGAR值来判断比对详细情况是不够的，所以，比如一条序列长度为36bp，比对到基因组上，CIGAR值为“36M”，单凭这个值是不能判断是否为完美匹配的。

我们需要根据bwa结果中的OPT列，即tag值来进行进一步的判断。bwa中提供的tag值如下：

Tag	Meaning
NM	Edit distance
MD	Mismatching positions/bases
AS	Alignment score
BC	Barcode sequence
X0	Number of best hits
X1	Number of suboptimal hits found by BWA
XN	Number of ambiguous bases in the referenece
XM	Number of mismatches in the alignment
XO	Number of gap opens
XG	Number of gap extentions
XT	Type: Unique/Repeat/N/Mate-sw
XA	Alternative hits; format: (chr,pos,CIGAR,NM;)*
XS	Suboptimal alignment score
XF	Support from forward/reverse alignment
XE	Number of supporting seeds

在使用tag信息挑选perfect match时，设置NM（编辑距离）为0，XM（错配个数）为0，X0（最佳匹配个数）为1。之所以设置最佳匹配数，是因为一条序列有可能有多个完美匹配，这种序列在后续分析中不会用到，这个参数是可选的。

使用Perl语言设置的过滤条件如下：

next if $line !~ /NM:i:0/;
next if $line !~ /XM:i:0/;
next if $line !~ /X0:i:1s+/;

(完)

以上就是机智手套为你收集整理的SAM得到完美匹配（perfect match）的全部内容，希望文章能够帮你解决SAM得到完美匹配（perfect match）所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

$完美匹配我加的概念积和式模2和行列式积和式模2^kPermanent:偶图(二部图)的带权完美匹配数目行列式：带符号的图覆盖于带符号的偶图完美匹配Pfaffian：带符号的完美匹配反对称矩阵的Pfaffian一般图的pfaffian ≤ T \le_T$