python访问文件结构中的数据

我想以最有效的方式访问存储在目录（〜20）中的.txt文件（〜1000）中的每个值（〜10000）。当抓取数据时，我想将它们放在HTML字符串中。我这样做是为了为每个文件显示一个包含表格的HTML页面。伪：python访问文件结构中的数据

fh=open('MyHtmlFile.html','w') 
    fh.write('''<head>Lots of tables</head><body>''') 
    for eachDirectory in rootFolder: 

     for eachFile in eachDirectory: 
      concat='' 

      for eachData in eachFile: 
       concat=concat+<tr><td>eachData</tr></td> 
      table=''' 
        <table>%s</table> 
        '''%(concat) 
     fh.write(table) 
    fh.write('''</body>''') 
    fh.close()

必须有一个更好的方法（我想这将需要永远）！我已经检查了set（）并读了一些关于hashtables的内容，而是在漏洞被挖掘之前询问专家。

谢谢你的时间！ /卡尔

来源

2011-08-18 ckarlbe

只是一个提示：连接字符串+ =绝对不鼓励大量的字符串。 –

@jellybean如何提供连接字符串的替代方法？ – Raz

追加他们所有的列表mylist'和'“”.join（mylist）'他们之后 –

import os, os.path 
# If you're on Python 2.5 or newer, use 'with' 
# needs 'from __future__ import with_statement' on 2.5 
fh=open('MyHtmlFile.html','w') 
fh.write('<html>\r\n<head><title>Lots of tables</title></head>\r\n<body>\r\n') 
# this will recursively descend the tree 
for dirpath, dirname, filenames in os.walk(rootFolder): 
    for filename in filenames: 
     # again, use 'with' on Python 2.5 or newer 
     infile = open(os.path.join(dirpath, filename)) 
     # this will format the lines and join them, then format them into the table 
     # If you're on Python 2.6 or newer you could use 'str.format' instead 
     fh.write('<table>\r\n%s\r\n</table>' % 
        '\r\n'.join('<tr><td>%s</tr></td>' % line for line in infile)) 
     infile.close() 
fh.write('\r\n</body></html>') 
fh.close()

来源

2011-08-18 11:37:28 agf

你为什么“想象它会永远”？您正在阅读该文件，然后将其打印出来 - 这几乎是您作为要求提供的唯一一件事 - 而这正是您所做的一切。您可以通过几种方式调整脚本（读取块不是行，调整缓冲区，打印出来而不是连接等），但是如果您不知道现在需要多少时间，您怎么知道什么更好/更糟？

配置文件首先，然后找到脚本是否太慢，然后找到一个缓慢的地方，然后才进行优化（或询问优化）。

来源

2011-08-18 11:26:38 viraptor

我不寻找优化的代码，我正在寻找一种不同的（更高效或优雅的）解决方案。我现在已经实现了伪代码，耗时215秒。 – ckarlbe

答案的要点是只有一个解决方案（直到开始分析）：您想要读取所有文件中的所有数据，唯一的方法是读取所有文件中的所有数据。没有做后者，没有办法做前者。 –

python访问文件结构中的数据

回答

相关问题