在NCBI下载测序数据时有很多是以reads序列 + count数的格式,这种是作者去完接头并过滤掉低质量reads后的结果。下面实现将reads count格式转化为fasta格式
cat reads_count.txt
AAACCCGGGTTT 3
ACAAGATTAG 5
TAGACAGA 1
python实现
fw = open('./reads.fas', 'w')
s = 0
with open('./reads_count.txt', 'r') as fr:
for line in fr.readlines():
s += 1
name = '>ID' +s + '_' + line.strip().split('\t')[1]
seq = line.strip().split('\t')[0]
fw.write(name + '\n' + seq + '\n')
fw.close()
linux实现
awk