2014-09-26 52 views
0

所以我一直在试图使用条件来只打印一个文件的一部分,但由于某种原因,当我在ipython中运行代码只是不断运行,永远不会停止。如何只使用python中的一部分文件?

我运行它的文件是:

Use the -noinfo option to turn off this help. 
Use the -help option to get a list of command line options. 

pilercr v1.06 
By Robert C. Edgar 

Temp1.None.fasta: 523 putative CRISPR arrays found. 



DETAIL REPORT 



Array 1 
>contig-856000000 902 nucleotides 

     Pos Repeat  %id Spacer Left flank Repeat          Spacer 
========== ====== ====== ====== ========== ======================================== ====== 
     28  40 95.0  26 TGCTTCCCCG -.....................................T. CTTGGTCTTGCTGGTTCTCACCGACT 
     94  40 95.0  25 CTCACCGACT .T....................................C. GTCAGCGTGTAGCGACTGTATCTGG 
     159  40 100.0   CTGTATCTGG ........................................ TTGCTCGAA 
========== ====== ====== ====== ========== ======================================== 
     3  40    25    TAGTTGTGAATAGCTGACAAAATCATATCATATACAACAG 


Array 2 
>contig-2277000000 590 nucleotides 

     Pos Repeat  %id Spacer Left flank Repeat         Spacer 
========== ====== ====== ====== ========== ===================================== ====== 
     19  37 100.0  37 GAGGGTGAGG ..................................... ACTTTAGGTTCAAATCCGTAGAGCTGATCTGTAATAG 
     93  37 100.0  37 TCTGTAATAG ..................................... ATTCCGTTGTTGAAATAAAGTATGAATAATATTTGGT 
     167  37 100.0  35 AATATTTGGT ..................................... TTCTCGAACGTTCCATGCTTCATAATATACCTCCT 
     239  37 100.0  39 TATACCTCCT ..................................... CTGATGAATCTTACCTCGTACAGTGATGTAGCCAGGTAA 
     315  37 100.0   AGCCAGGTAA ..................................... CGTCAGTCATG 
========== ====== ====== ====== ========== ===================================== 
     5  37    37    GTAGAAATGAGACGTCCGCTGTAAAGGACATTGATAC 


Array 3 
>contig-2766000000 540 nucleotides 

     Pos Repeat  %id Spacer Left flank Repeat         Spacer 
========== ====== ====== ====== ========== ===================================== ====== 
     172  37 100.0  29 GTTTTAGATG ..................................... TATCGTAGCATCCCACTCCCCTGGTGTAA 
     238  37 100.0  29 CCTGGTGTAA ..................................... GTTGGACGCGCTGCTGGACGATAGGCTGC 
     304  37 97.3  29 GATAGGCTGC T.................................... ACGCCTTACAAGCTGACCCGCGCCCAATT 
     370  37 100.0   GCGCCCAATT ..................................... GTACCTTGTTC 
========== ====== ====== ====== ========== ===================================== 
     4  37    29    GGCTGTAAAAAGCCACCAAAATGATGGTAATTACAAG 


SUMMARY BY SIMILARITY 



Array   Sequence Position  Length # Copies Repeat Spacer + Consensus 
===== ================ ========== ========== ======== ====== ====== = ========= 
    5 contig-504300000   18   364   6  33  33 + --------------------------GTCGCT-C---CCCGCATGGGGAGCG--T-GGATTGAAAT----- 
    8 contig-974700000   15   229   4  32  33 - --------------------------GTCGCC-C---CCCATGCG-GGGGCG--T-GGATTGAAAC----- 
    12 contig-759000001   464   503   8  33  34 + --------------------------GTCGCT-C---CCTTTACGGGGAGCG--T-GGATTGAAAT----- 
    16 contig-293000000   77   406   6  37  36 - -----------------------GTAGAAATGAG---TTCCCCGATGAGAAG--G-GGATTGACAC----- 
    17 contig-457600000   28   416   6  37  38 - -----------------------GTAGAAATGGG---TGTCCCGATAGATAG--G-GGATTGACAC----- 
    18 contig-527300000   1   351   6  33  32 + -----------------------ATCGCG----C---CCCCACGGGGGCGTG--T-GAATTGAAAC----- 
    27 contig-132220000   21   234   4  33  34 + --------------------------GTCGCT-C---CCTTCACGGGGAGCG--T-GGATTGAAAT----- 
    36 contig-602400000   35   304   5  33  34 - --------------------------GTCGCC-C---CCCACGTGGGGGGCG--T-GGATTGAAAC----- 
    38 contig-124860000   131   232   4  32  34 + --------------------------GTCGCA-C---CCCTCGC-GGGTGCG--T-GGATTGAAAC----- 
    54 contig-979400000   138   231   4  32  34 - --------------------------GTCGCC-C---CTCTTGCA-GGGGCG--T-GGATTGAAAC----- 
    61 contig-992000005   149   693  11  30  36 - --------------------GTTAAAATCA--GA---CC---ATTTTG--------GGATTGAAAT----- 
    68 contig-103110000   37   238   4  34  34 + -----------------------GTCGTC----C---CCCACACGGGGGACG--T-GGATTGAAATA---- 
    73 contig-372900000  1627  1013  16  30  35 + ----------------------------ATTAGAATCGTACTT--ATGTAGAATTGAAAT----------- 

而且到目前为止我的代码是:

fname = 'crispr_pilrcr_1.out' 
start=False 
end=False 
counter = 0 
for line in open(fname, 'r'): # Open up the file 
    s = line.split() # Split each line into words 
    if not s: continue # Remove empty lines which would otherwise cause errors 
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try: 
     if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL' 
      start=True 
      print 'Starting' 
     if s[0] == 'SUMMARY': # Only end once this section has ended 
      end=True 
      print 'Ending' 
     while start==True or end==False: # Whilst in the section of the PILER-CR output which provides spacer sequences 
      try: 
       int(s[0]) 
       print s[7] 
      except ValueError: 
       continue 
    except ValueError: 
     continue 

我估计有可能有点问题“而”循环然而相同当我使用'和'而不是'或'时持续运行。

正如我所说我想选择“细节报告”和“按相似性概述”之间的文件部分,因此为什么我设置了条件以便在发现后尝试。

任何帮助你们可以提供的将是伟大的。

感谢, 汤姆

+1

我会在黑暗中射击,在这里。尝试用'if'替换'while'。 (可能是'或'与'和') – Kevin 2014-09-26 14:19:01

回答

3

考虑像

fname = 'crispr_pilrcr_1.out' 
counter = 0 
printing = False 
for line in open(fname, 'r'): # Open up the file 
    s = line.split() # Split each line into words 
    if not s: continue # Remove empty lines which would otherwise cause errors 
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try: 
     if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL' 
      printing = True 
      print 'Starting' 
     elif s[0] == 'SUMMARY': # Only end once this section has ended 
      printing = False 
      print 'Ending' 
     elif printing: 
      try: 
       # Anything you put here will only be called for the lines 
       # between DETAIL... and SUMMARY... 
      except ValueError: 
       continue 
    except ValueError: 
     continue 

基本上,您使用的是单变量printing当for循环遇到“DETAIL ...”时被初始化为False,被设置为True,当for循环遇到“SUMMARY ...”时被重置为False。

对于与“DETAIL ...”或“SUMMARY ...”不匹配的行,并且printing为True(即对于两个标题之间的行),将执行try块。

+0

这个效果很好。非常感谢! – Tom 2014-09-26 14:24:42

+0

看起来你不需要最后一个'elif'中的'try'块。 – 2014-09-26 14:35:35

1

的问题是,你永远不会改变你的while循环中的startend值。所以,无论他们拥有哪些允许你进入循环的值,每次迭代都是一样的。

没有彻底检修你的逻辑,我猜你可能想要做的事,如:

while start or not end: 
    try: 
     int(s[0]) 
     print s[7] 
    except ValueError: 
     end = True 
     start = False 
+0

如果没有'ValueError',你仍然有一个无限循环。 – 2014-09-26 14:20:29

+0

谢谢,这似乎解决了无限循环,但仍然不打印。 – Tom 2014-09-26 14:24:16

相关问题