2014-11-21 32 views
0

我想从目录中读取文件,并将每个文件的第一句写入新文件,直到100个字(或稍多于100个,因为我想写完成句子)写入新文件。关于读取文件总字数的疑问

我做以下列方式:

f = open(file1.txt, "w") 
f.close() 
for d_file in os.listdir(path): 
    d_file_path = os.path.join(path, d_file) 
    if os.path.isfile(d_file_path): 
     with open(d_file_path, "r") as f: 
      first = f.readline() 
      f1 = open ("file1.txt", "r") 
      textInput = f1.read() 
      f1.close() 
      l = len(textInput.split(' ')) 
      print l 
      if l >= 0 and l <= 100: 
       f2 = open("file1.txt", "a") 
       f2.write(first) 
       print first 

但是,我得到它打印语句输出错误,即使正确地写入新文件。

我的问题是: 为什么我会将“l”的值为0两次? 此外,当我只需找到文件中的单词总数后,它已被写入:

>>>f = open(file1.txt, 'r') 
>>> text = f.read() 
>>> l = len(text.split(' ')) 
>>> print l 

我得到:111

但是,该文件是:

An influential lawmaker from the governing Labor Party on Saturday backed Spanish requests to question former Chilean dictator Gen. Augusto Pinochet, in London for back surgery, on allegations of genocide and terrorism. 
British police said Saturday they have arrested former Chilean dictator Gen. Augusto Pinochet on allegations of murdering Spanish citizens during his years in power. 
Eight years after his turbulent regime ended, former Chilean strongman Gen. Augusto Pinochet is being called to account by Spanish authorities for the deaths, detention and torture of political opponents. 
Former Chilean dictator Gen. Augusto Pinochet has been arrested by British police on a Spanish extradition warrant, despite protests from Chile that he is entitled to diplomatic immunity. 

有没有114字?

有人可以回答我的问题吗?

编辑:

现在我做:l = len(textInput.strip().split()),它给了我114个字作为计数,但打印语句仍然是相同的。现在的输出是这样的:

0 
An influential lawmaker from the governing Labor Party on Saturday backed Spanish requests to question former Chilean dictator Gen. Augusto Pinochet, in London for back surgery, on allegations of genocide and terrorism. 

0 
British police said Saturday they have arrested former Chilean dictator Gen. Augusto Pinochet on allegations of murdering Spanish citizens during his years in power. 

32 
Eight years after his turbulent regime ended, former Chilean strongman Gen. Augusto Pinochet is being called to account by Spanish authorities for the deaths, detention and torture of political opponents. 

56 
Former Chilean dictator Gen. Augusto Pinochet has been arrested by British police on a Spanish extradition warrant, despite protests from Chile that he is entitled to diplomatic immunity. 

86 
President Fidel Castro said Sunday he disagreed with the arrest in London of former Chilean dictator Augusto Pinochet, calling it a case of international meddling. 

114 
114 
114 
114 
114 
+0

这没有意义。如果你的file1.txt有你发布的内容,你的第一个打印语句应该打印114.为什么要打印0?并且在写完之后你还没有关闭文件。 – user3885927 2014-11-21 22:04:32

+0

哦!对不起,我在问题中编辑了我的代码。看看有更好的理解。 file1.txt是在运行此循环之前创建的文件。 – 2014-11-21 22:08:52

+0

哦!那是因为没有关闭文件。对于那个很抱歉。非常感谢! :D – 2014-11-21 22:17:07

回答

0

有114个字,如你所说。你怎么分裂'。换行不计为“'。因此,第一行“恐怖主义”中的最后一句话。而下一行“英国”中的第一个单词被统计为“恐怖主义。\ n英国人”格式中的一个单词。同样的事情多两行。所以总共有三个单词与前一句中的最后一个单词结合在一起,给你三个更少的单词。

如果你想分割空间和新行,只需使用不带参数的split(),它应该有114个单词。从DOC here下面详细说明:

str.split([sep[, maxsplit]]) 

如果未指定SEP是或为无,一个不同的分割算法是 施加:连续的空白的运行被视为单一 分离器,和结果将含有在起始 处没有空字符串,或者如果字符串具有前导或尾随空白,则结束。因此, 将空字符串或仅包含空格 的字符串与无分隔符分开将返回[]。

对于你打印什么内容的另一个问题,请准确地提供你的代码。正如我所看到的,你错过了file1.txt的引用,你可能会错过其他的东西。

+0

我编辑了我的问题。 – 2014-11-21 21:55:27