概述
python生信练习
1 在文件中创建字符串nN
Create 10 strings in a file, for each string, the first half is
n and the last half is N. The length of the string increase
by 2.
Example:
nN
nnNN
…
nnnnnNNNNN
…
#open函数:file_object=open(file_name,access_mode='r'),access_mode:文件使用模式
n=open('n.txt','w') # r只读,w可写,a追加,创建txt文件
for i in range(1,10):
k=1
while(k<=i):
n.write('n')
k=k+1
k=1
while(k<=i):
n.write('N')
k=k+1
n.write('n')#n为换行符
2 计算各种3bp组合的频率
Calculate the frequency of all kinds of 3bps in a given genome sequence**:**CTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCAGTCGATATCCACAAACACAGAAACAACCCTTCGCAGCCTGGCCACACACATCATTCCACAACACATAGGACTCCCCCACAAACACAGAAACAACCCTTCGCAGCCTGGTCACACACATCATTCCACAACACATAGGACTCCCCCACAAACGTAATGGAGAGGTTGCAATAACCCATAAAATCACAATTAATAATAGTAGTGTTGCATATACCGACACAGACAGCACAAGTGGACGTATGACAAGACACACAAATAGTAGCACACAAAAGCAAAGCAAAAAGCATAGCACAAT
Output the frequency as key=>value pairs, and sorted by keys in alphabetical order.
AAA 10
AAC 2
…
import itertools#导入库,该库的作用是排列组合
import re#导入re模块,该库的作用是正则表达式
from collections import Counter
import operator
str = "CTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCCCACTGTACTTCCCACTGCTCCTGAACAACCAGTCGATATCCACAAACACAGAAACAACCCTTCGCAGCCTGGCCACACACATCATTCCACAACACATAGGACTCCCCCACAAACACAGAAACAACCCTTCGCAGCCTGGTCACACACATCATTCCACAACACATAGGACTCCCCCACAAACGTAATGGAGAGGTTGCAATAACCCATAAAATCACAATTAATAATAGTAGTGTTGCATATACCGACACAGACAGCACAAGTGGACGTATGACAAGACACACAAATAGTAGCACACAAAAGCAAAGCAAAAAGCATAGCACAAT";
print(len(str))#字符串长度为3的倍数
str_list = re.findall(".{3}",str)#findall是找到所有的字符,再在字符中添加空格
new_str = " ".join(str_list)#再在字符中添加空格
new_str=new_str.split()#将字符串按空格分隔,此时每个3bp看作一个元素,计数时会按照这个计数,否则只按照ATCG单个字符计数
result = Counter(list(new_str))#Counter函数,传入列表,计算列表中元素出现次数,返回字典
sort_key_result = dict(sorted(result.items(), key=operator.itemgetter(0)))
#按照key值升序
print(result)#输出字典(已经算出每个3bp的频率)
print(sort_key_result)#输出排序后的字典
3.对比两序列,若字符一样在二维矩阵中将相应位置赋值为1
We have two sequences:
S1: ATGATAGCAGTGAAATGGG
S2: GATAGCAGTGAAACGGGCA
Build up a two-dimensional array, with length equal to the sequence length.
for array[i][j], If S1[i] eq S2[j], array[i][j]=1; If S1[i] ne S2[j], array[i][j]=0;
Output the array in a single file with a human-readable format.
import numpy as np
import pandas as pd#导入numpy和pandas库
S1=list('ATGATAGCAGTGAAATGGG')
print(S1)
S2=list('GATAGCAGTGAAACGGGCA')
print(S2)
print(len(S1))#获得长度,依次创建二维数组
compare = [[0]*19 for i in range(19)]#创建二维数组
for i in range(19):
for j in range(19):
if(S1[i]==S2[j]):
compare[i][j]=1
print(compare)
a = np.array(compare)#将二维数组变换为矩阵形式,用到了numpy库
print(a)
data_df = pd.DataFrame(a)#关键1,将ndarray格式转换为DataFrame
data_df.columns = S2#行索引,即行的表头
data_df.index = S1#列索引,即列的表头
writer = pd.ExcelWriter('compare.xlsx')
#关键2,创建名称为compare的excel表格
data_df.to_excel(writer,'page_1',float_format='%.5f')
#关键3,float_format 控制精度,将data_df写到表格的第一页中。若多个文件,可以在page_2中写入
writer.save()#存储表格
未完待续,持续更新…
最后
以上就是犹豫雪糕为你收集整理的Python生信练习python生信练习的全部内容,希望文章能够帮你解决Python生信练习python生信练习所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复