2017-07-24 101 views
1

我是Python新手(使用Python 3.6)。我有一个包含公司信息的read.txt文件。文件开始与不同的报告特性f.readline与f.read打印输出

CONFORMED PERIOD REPORT:    20120928 #this is 1 line 
DATE OF REPORT:      20121128 #this is another line 

and then starts all the text about the firm..... #lots of lines here 

我试图提取两个日期([“20120928”,“20121128”])以及一些字符串是文本(即,如果该字符串存在,那么我想要一个'1')。最终,我想要一个向量给我两个日期+不同字符串的1和0,即:''20120928','20121128','1','0']。我的代码如下:

exemptions = [] #vector I want 

with open('read.txt', 'r') as f: 
    line2 = f.read() # read the txt file 
    for line in f: 
     if "CONFORMED PERIOD REPORT" in line: 
      exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", "")) # add line without stating CONFORMED PERIOD REPORT, just with the date) 
     elif "DATE OF REPORT" in line: 
      exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above 

    var1 = re.findall("string1", line2, re.I) # find string1 in line2, case-insensitive 
    if len(var1) > 0: # if the string appears, it will have length>0 
     exemptions.append('1') 
    else: 
     exemptions.append('0') 
    var2 = re.findall("string2", line2, re.I) 
    if len(var2) > 0: 
     exemptions.append('1') 
    else: 
     exemptions.append('0') 

print(exemptions) 

如果我运行这段代码,我得到[“1”,“0”],省略了日期,并给予正确的读取文件的,VAR1存在(OK“1”)和var2不(OK'0')。我不明白的是为什么它不报告日期。重要的是,当我将line2更改为“line2 = f.readline()”时,我获得['20120928','20121128','0','0']。现在确定日期,但我知道var1存在,它似乎没有读取文件的其余部分?如果我省略“line2 = f.read()”,它会为每行输出一个0的向量,除了我想要的输出。我怎样才能省略这些0?

我所需的输出将是:[ '20120928', '20121128', '1', '0']

抱歉打扰。不管怎样,谢谢你!

回答

0

我通过它去的方式终于如下:

exemptions = [] #vector I want 

with open('read.txt', 'r') as f: 
    line2 = "" # create an empty string variable out of the "for line" loop 
    for line in f: 
     line2 = line2 + line #append each line to the above created empty string 
     if "CONFORMED PERIOD REPORT" in line: 
      exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", "")) # add line without stating CONFORMED PERIOD REPORT, just with the date) 
     elif "DATE OF REPORT" in line: 
      exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above 

    var1 = re.findall("string1", line2, re.I) # find string1 in line2, case-insensitive 
    if len(var1) > 0: # if the string appears, it will have length>0 
     exemptions.append('1') 
    else: 
     exemptions.append('0') 
    var2 = re.findall("string2", line2, re.I) 
    if len(var2) > 0: 
     exemptions.append('1') 
    else: 
     exemptions.append('0') 

print(exemptions) 

到目前为止,这是我得到。它为我工作,虽然我猜与美丽的工作会增加代码的效率。下一步:)

0

line2 = f.read()读取整个文件到line2,所以没有什么可以为您的for line in f:循环读取。

3

f.read()会将整个文件读入变量line2。如果你想逐行读取,你可以跳过f.read()一起,只是重复,像这样

with open('read.txt', 'r') as f: 
    for line in f: 

否则书面,你.read()line2没有更多的文字中读出f后,因为它是所有包含在line2变量中。

+0

将会更好地使用f.readlines(),然后对行进行换行而不是按\ n分割,因为这可能不会给您预期的结果。 – Ajurna

+0

我不确定第一个代码片段甚至值得一提的建议,第二种方式显然是要走的路 –

+0

好点。放弃了第一种方法。 – CoryKramer