概述
twoBitToFa
在UCSC下载小鼠的mm10版本基因组数据时没有找到.fa文件,发现了一个mm10.2bit文件,估计是把基因组序列存成了二进制文件,翻看文件说明:
mm10.2bit - contains the complete mouse/mm10 genome sequence in the 2bit file format. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case. The utility program, twoBitToFa (available from the kent src tree), can be used to extract .fa file(s) from this file.
A pre-compiled version of the command line tool can be found at:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ (重点划线,总之UCSC给出了解决方法)
See also:
http://genome.ucsc.edu/admin/git.html
http://genome.ucsc.edu/admin/jk-install.html
下载twoBitToFa
chmod +x twoBitToFa
export PATH=$PATH:/home/xxx/lustre1/software/twoBit2Fa
source ~/.bashrc
运行twoBitToFa
============================= twoBitToFa==================================
twoBitToFa - Convert all or part of .2bit file to fasta
usage:
twoBitToFa input.2bit output.fa
options:
-seq=name Restrict this to just one sequence.
-start=X Start at given position in sequence (zero-based).
-end=X End at given position in sequence (non-inclusive).
-seqList=file File containing list of the desired sequence names
in the format seqSpec[:start-end], e.g. chr1 or chr1:0-189
where coordinates are half-open zero-based, i.e. [start,end).
-noMask Convert sequence to all upper case.
-bpt=index.bpt Use bpt index instead of built-in one.
-bed=input.bed Grab sequences specified by input.bed. Will exclude introns.
-bedPos With -bed, use chrom:start-end as the fasta ID in output.fa.
-udcDir=/dir/to/cache Place to put cache for remote bigBed/bigWigs.
Sequence and range may also be specified as part of the input
file name using the syntax:
/path/input.2bit:name
or
/path/input.2bit:name
or
/path/input.2bit:name:start-end
twoBitToFa input.2bit output.fa
最后
以上就是慈祥奇迹为你收集整理的UCSC_2bit基因组格式ToFASTA格式twoBitToFa的全部内容,希望文章能够帮你解决UCSC_2bit基因组格式ToFASTA格式twoBitToFa所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复