BioPython 是一个用来处理序列和生物信息的python包,里面包含了很多的工具,可以用来直接读取fasta格式。安装可以通过两种方式,pip方式:
1. pip 方式
pip3 install biopython
打开python终端,
>>> import Bio
>>> from Bio import SeqIO
报错:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-5-5d24a7c49c42> in <module>
----> 1 from Bio import SeqIO
ImportError: cannot import name 'SeqIO' from 'Bio' (unknown location)
Bio可以导入,但package显示未知的位置,查看biopython包的位置:
>>> import Bio
>>> help(Bio)
输出:
Help on package Bio:
NAME
Bio
PACKAGE CONTENTS
FILE
(built-in)
没有发现包,由于我用的是jupyter notebook,大家可以关闭kernel然后重启,或者关闭重新打开了一个页面,再测试。如果还有问题,尝试使用第二个方式:
2. 使用conda安装
确保计算机安装了anaconda 或者miniconda
直接在终端中输入:
conda install -c conda-forge biopython
出现 Proceed ([y]/n)?
时按y,安装成功。
此时进入python 终端,测试
Python 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>>
没有问题。如果使用的jupernotebook,要重启kernel,然后测试。观察包的位置
>>> from Bio import SeqIO
>>> import Bio
>>> help(Bio)
>
out:
Help on package Bio:
NAME
Bio - Collection of modules for dealing with biological data in Python.
DESCRIPTION
The Biopython Project is an international association of developers
of freely available Python tools for computational molecular biology.
http://biopython.org
PACKAGE CONTENTS
Affy (package)
Align (package)
AlignIO (package)
Alphabet (package)
Application (package)
Blast (package)
CAPS (package)
Cluster (package)
Compass (package)
Crystal (package)
Data (package)
Emboss (package)
Entrez (package)
ExPASy (package)
FSSP (package)
File
GenBank (package)
Geo (package)
Graphics (package)
HMM (package)
Index
KDTree (package)
KEGG (package)
LogisticRegression
MarkovModel
MaxEntropy
Medline (package)
NMR (package)
NaiveBayes
Nexus (package)
PDB (package)
Pathway (package)
Phylo (package)
PopGen (package)
Restriction (package)
SCOP (package)
.....
安装成功
3. 使用biopython读取fasta格式
from Bio import SeqIO
for seq_record in SeqIO.parse('../input/example.fa', "fasta"):
print(seq_record.id)
print(seq_record.seq)
输出:
ENST00000435737.5
ATGTTTCGCATCACCAACATTGAGTTTCTTCCCGAATACCGACAAAAGGAGTCCAGGGAATTTCTTTCAGTGTCACGGACTGTGCAGCAAGTGATAAACCTGGTTTATACAACATCTGCCTTCTCCAAATTTTATGAGCAGTCTGTTGTTGCAGATGTCAGCAACAACAAAGGCGGCCTCCTTGTCCACTTTTGGATTGTTTTTGTCATGCCACGTGCCAAAGGCCACATCTTCTGTGAAGACTGTGTTGCCGCCATCTTGAAGGACTCCATCCAGACAAGCATCATAAACCGGACCTCTGTGGGGAGCTTGCAGGGACTGGCTGTGGACATGGACTCTGTGGTACTAAATGAAGTCCTGGGGCTGACTCTCATTGTCTGGATTGACTGA
ENST00000419127.5
ATGTTTCGCATCACCAACATTGAGTTTCTTCCCGAATACCGACAAAAGGAGTCCAGGGAATTTCTTTCAGTGTCACGGACTGTGCAGCAAGTGATAAACCTGGTTTATACAACATCTGCCTTCTCCAAATTTTATGAGCAGTCTGTTGTTGCAGATGTCAGCAACAACAAAGGCGGCCTCCTTGTCCACTTTTGGATTGTTTTTGTCATGCCACGTGCCAAAGGCCACATCTTCTGTGAAGACTGTGTTGCCGCCATCTTGAAGGACTCCATCCAGACAAGCATCATAAACCGGACCTCTGTGGGGAGCTTGCAGGGACTGGCTGTGGACATGGACTCTGTGGTACTAAATGACAAAGGCTGCTCTCAGTACTTCTATGCAGAGCATCTGTCTCTCCACTACCCGCTGGAGATTTCTGCAGCCTCAGGGAGGCTGATGTGTCACTTCAAGCTGGTGGCCATAGTGGGCTACCTGATTCGTCTCTCAATCAAGTCCATCCAAATCGAAGCCGACAACTGTGTCACTGACTCCCTGACCATTTACGACTCCCTTTTGCCCATCCGGAGCAGCATCT
....