我是靠谱客的博主 伶俐大米,最近开发中收集的这篇文章主要介绍踩坑sunbeam rbt 去除host reads,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

最近需要跑qc流程,用sunbeam的qc流程。然后再preprocess.tsv文件里,deconta文件的fastq数目和文件里记录的non host不一致。貌似生成的fastq文件没有去除host。找问题。溯源到decontamination rule

cat /programs/sunbeam-stable/rules/qc/decontaminate.rules
rule filter_reads:
    input:
        hostreads = str(QC_FP/'decontam'/'intermediates'/'{sample}_hostreads.ids'),
        reads = str(QC_FP/'cleaned'/'{sample}_{rp}.fastq.gz'),
        hostids = expand(str(QC_FP/'decontam'/'intermediates'/'{host}'/'{{sample}}.ids'), host=HostGenomes.keys())
    output:
        reads = str(QC_FP/'decontam'/'{sample}_{rp}.fastq.gz'),
        log = str(QC_FP/'log'/'decontam'/'{sample}_{rp}.txt')
    run:
        original = int(str(subprocess.getoutput(
            "zcat {} | wc -l".format(input.reads))).strip())//4
        host = int(subprocess.getoutput(
            "cat {} | wc -l".format(input.hostreads)).strip())
        nonhost = int(original-host)
        shell("""
        gzip -dc {input.reads} | 
        rbt fastq-filter {input.hostreads} | 
        gzip > {output.reads}
        """)
        
        hostdict = OrderedDict()
        for hostid in input.hostids:
            hostname = os.path.basename(os.path.dirname(hostid))
            hostcts = int(subprocess.getoutput("cat {} | wc -l".format(hostid)).strip())
            hostdict[hostname] = hostcts
        
        with open(output.log, 'w') as log:
            log.write("{}n".format("t".join(list(hostdict.keys()) + ["host","nonhost"] )))
            log.write("{}n".format("t".join( map(str, list(hostdict.values()) + [host, nonhost]) )))

可以发现这个filter_reads的rule,通过生成的id文件(这个id文件是所有map到host 基因组的reads ID号码。)

如下:

A00709:489:HCK2KDSX5:2:1101:49781:1000
A00709:489:HCK2KDSX5:2:1101:81782:1000
A00709:489:HCK2KDSX5:2:1101:89383:1000
A00709:489:HCK2KDSX5:2:1101:122644:1000
 

然后命令行是用https://github.com/rust-bio/rust-bio-tools/blob/master/README.md

rbt来去掉给定的id文件。 

我进入到它对应的docker环境,截取输入文件的前100行,执行不报错,

cat D0head100.fastq | rbt fastq-filter D0head100.ids |wc -l 
rbt: Relink `/opt/conda/envs/sunbeam/bin/../lib/./libgfortran.so.5' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
100

所以没有成功去除还是一百行。

查看fastq的header,我发现

@A00709:489:HCK2KDSX5:2:1101:13801:1031/1
AAGTGTGTATTTCTCATTTTCCGTGATTTTCAGTTTTCTCGCCATATTTCAGGTCCTACAGTGTGCATTTCTCATTTTTCACATTTTTCATTGATTTCGTCATTTTTCAAGGGTTCAAGTGGTTGTTTCTCATTTTTAATGATTTTCATT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF,FFFFFFFFFFFF:FFFFFFFFFFFF:FFFFF:FFFFFFFFFFF,F,FFFFFFF:FFFFFFFFF:,FFFF,FF,FFF,FFF,:
@A00709:489:HCK2KDSX5:2:1101:14109:1031/1
GACTCATACCTTCAATCTCAGCCTTTAGGAGGCAGAGGCAGATGCATGTC
+
FFFFFFFF,FFFFFFFFFFF:F:F:FFF:FFFFFFFFF::FF,FFF,F,F

header文件是带有/1 &/2的。猜测是这个原因导致rbt不识别。更改id文件,手动加上/1。

新的id文件 

A00709:489:HCK2KDSX5:2:1101:49781:1000/1
A00709:489:HCK2KDSX5:2:1101:81782:1000/1
A00709:489:HCK2KDSX5:2:1101:89383:1000/1
A00709:489:HCK2KDSX5:2:1101:122644:1000/1

再次重复上述命令。

(sunbeam) user@84ed9ba9a5a3:/analysis$ cat D0head100.fastq | rbt fastq-filter D0head100.ids |wc -l 
rbt: Relink `/opt/conda/envs/sunbeam/bin/../lib/./libgfortran.so.5' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
84

去除成功。4条reads去除成功。

所以bug就是这个header不能直接连接着/1和/2, 如果是space中间空着是可以的。

解决这个bug的灵感来源于这里。

https://github.com/sunbeam-labs/sunbeam/issues/221

最后

以上就是伶俐大米为你收集整理的踩坑sunbeam rbt 去除host reads的全部内容,希望文章能够帮你解决踩坑sunbeam rbt 去除host reads所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(49)

评论列表共有 0 条评论

立即
投稿
返回
顶部