我正在使用Python 2.6.6,并且我试图删除中的,它们与file1中的读取重叠(即相同)。这里是代码我想实现:从SeqIO.index生成的字典中删除项目

ref_reads = SeqIO.index("file1.fastq", "fastq") 
spk_reads = SeqIO.index("file2.fastq", "fastq") 

for spk in spk_reads: 
    if spk in ref_reads: 
    del ref_reads[spk] 


AttributeError的:_IndexedSeqFileDict实例没有属性 '__delitem__'



# import read data 
ref_reads = SeqIO.index("main.fastq", "fastq") 
spk_reads = SeqIO.index("over.fastq", "fastq") 

# note that ref_reads.keys() doesn't return a list but a 'dictionary-  keyiterator', 
# so we turn it into a set to work with it 
ref_keys = set(ref_reads.keys()) 
spk_keys = set(spk_reads.keys()) 

# loop to remove overlap reads 
for spk in spk_keys: 
    if spk in ref_keys: 
     del ref_keys[spk] 

# output data 
output_handle = open(fname_out, "w") 
SeqIO.write(ref_reads[ref_keys], output_handle, "fastq") 



SeqIO.index()不返回一个真正的字典,但a dictionary like object, giving the SeqRecord objects as values

Note that this pseudo dictionary will not support all the methods of a true Python dictionary, for example values() is not defined since this would require loading all of the records into memory at once.


Note that this dictionary is essentially read only. You cannot add or change values, pop values, nor clear the dictionary.


from Bio import SeqIO 

ref_reads = SeqIO.parse("file1.fastq", "fastq") 
spk_reads = SeqIO.parse("file1.fastq", "fastq") 

ref_reads_dict = SeqIO.to_dict(ref_reads) 

for spk in spk_reads: 
    if spk.id in ref_reads_dict: 
     del ref_reads_dict[spk.id] 


from Bio import SeqIO 

ref_reads = SeqIO.index("file1.fastq", "fastq") 
spk_reads = SeqIO.index("file2.fastq", "fastq") 

# note that ref_reads.keys() doesn't return a list but a 'dictionary-keyiterator', 
# so we turn it into a set to work with it 
ref_keys = set(ref_reads.keys()) 
spk_keys = set(spk_reads.keys()) 

unique_ref_keys = ref_keys - spk_keys 

# this step might take a long time if your files are large 
unique_ref_reads = {key: ref_reads[key] for key in unique_ref_keys} 


how can I again solve the original problem of deleting items from SeqIO.index("file1.fastq", "fastq")?

就像我上文所述,SeqIO.index("file1.fastq", "fastq")返回一个只读_IndexedSeqFileDict对象。所以你不能,通过设计,从它删除项目。



from Bio import SeqIO 

ref_reads = SeqIO.index("file1.fastq", "fastq") 
spk_reads = SeqIO.index("file2.fastq", "fastq") 

ref_keys = set(ref_reads.keys()) 
spk_keys = set(spk_reads.keys()) 

unique_ref_keys = ref_keys - spk_keys 

# conserve memory by using a generator expression 
unique_ref_records = (ref_reads[key] for key in unique_ref_keys) 

# output new file with overlapping reads removed 
with open(fname_out, "w") as output_handle: 
    SeqIO.write(unique_ref_records , output_handle, "fastq") 

# optionally, create a new SeqIO.index() object 
unique_ref_reads = SeqIO.index(fname_out, "fastq") 

请您提供有用的建议。第一种解决方案起作用,但与我试图改进的代码相比,速度很慢。您能否提供有关您的第二个代码块的其他信息?我试图从file1.fastq中删除也在file2.fastq中的读取。根据你的第二个解决方案,我该如何再次解决从SeqIO.index(“file1.fastq”,“fastq”)删除项目的原始问题?我更新了这个问题以反映我最近的尝试。 – wa3j


@ wa3j:看我上面的编辑。 – BioGeek