2017-08-02 67 views
2

我有一个PDB文件,我需要提取它的残基序列号(resseq s)。基于手动检查PDB文件的前几行(粘贴在下面),我认为resseq应该是[22, 23, ...]。但是,Biopython的Bio.PDB模块会有其他建议(输出如下所示)。我不知道这是一个Biopython bug还是我在理解PDB格式时遇到问题。Biopython:resseq与pdb文件不匹配

ATOM  1 N GLY A 22  78.171 89.858 59.231 1.00 21.24   N 
ATOM  2 CA GLY A 22  79.174 88.827 58.999 1.00 20.87   C 
ATOM  3 C GLY A 22  80.438 89.415 58.391 1.00 21.89   C 
ATOM  4 O GLY A 22  80.362 90.202 57.440 1.00 23.18   O 
ATOM  5 N LEU A 23  81.588 89.069 58.972 1.00 21.51   N 
ATOM  6 CA LEU A 23  82.895 89.555 58.527 1.00 20.80   C 
ATOM  7 C LEU A 23  83.288 89.020 57.162 1.00 22.41   C 
ATOM  8 O LEU A 23  82.889 87.923 56.788 1.00 22.93   O 
ATOM  9 CB LEU A 23  83.973 89.232 59.560 1.00 20.97   C 
ATOM  10 CG LEU A 23  84.225 87.818 60.062 1.00 13.32   C 
ATOM  11 CD1 LEU A 23  85.448 87.888 60.939 1.00 15.24   C 
ATOM  12 CD2 LEU A 23  83.035 87.258 60.829 1.00 12.21   C 

我使用的代码提取resseq

... 
for i in chain: 
    print i.get_full_id() 

OUT:('pdb', 0, 'A', (' ', 2, ' ')) 
    ('pdb', 0, 'A', (' ', 3, ' ')) 
... 
+0

您能否提供您用于重现此输出的整个代码?你如何获得'连锁'? – fsimkovic

回答

3

Bio.PDB.Entity.get_full_id

def get_full_id(self): 
    """Return the full id. 

    The full id is a tuple containing all id's starting from 
    the top object (Structure) down to the current object. A full id for 
    a Residue object e.g. is something like: 

    ("1abc", 0, "A", (" ", 10, "A")) 

    This corresponds to: 

    Structure with id "1abc" 
    Model with id 0 
    Chain with id "A" 
    Residue with id (" ", 10, "A") 

    The Residue id indicates that the residue is not a hetero-residue 
    (or a water) because it has a blank hetero field, that its sequence 
    identifier is 10 and its insertion code "A". 
    """ 
    # The function implementation below here ... 

文档我假设你迭代您链的原子,而不是残基,它给出了每个Atom的完整id而不是Residue

如果在名为struct.pdb的文件中保存示例残差并运行下面的代码,您将得到正确的id s。

>>> structure = PDBParser().get_structure('test', 'struct.pdb') 
>>> for residue in structure.get_residues(): 
... print(residue.get_full_id()) 
('test', 0, 'A', (' ', 22, ' ')) 
('test', 0, 'A', (' ', 23, ' ')) 
>>> resseqs = [residue.id[1] for residue in structure.get_residues()] 
>>> print(resseqs) 
[22, 23] 
+0

谢谢!我非常抱歉,我在搜索2天后发现了代码中的一个错误,现在一切正常(我忘记了将预处理的PDB传递给我的程序而不是原来的程序)。感谢您的时间,并感到抱歉,这不是一个非常有用的问题!你认为我应该删除它吗? –

+0

@AlexMayorov如果他人有类似问题,请留下问题。 – fsimkovic