串联文本文件创建文件，每个字母

我会再做连接txt文件之间的空间，几乎一切顺利，但的出文件具有类似l o r e m i p s u m串联文本文件创建文件，每个字母

每个字母之间的空间，这里是我的代码

import glob 

all = open("all.txt","a"); 

for f in glob.glob("*.txt"): 
    print f 
    t = open(f, "r") 
    all.write(t.read()) 
    t.close() 

all.close()

我工作在Windows 7上，蟒蛇2.7

编辑
也许有更好的方法来连接文件？

EDIT2
我现在解码问题：

Traceback (most recent call last): 
    File "P:\bwiki\BWiki\MobileNotes\export\999.py", line 9, in <module> 
    all.write(t.read()) 
    File "C:\Python27\lib\codecs.py", line 671, in read 
    return self.reader.read(size) 
    File "C:\Python27\lib\codecs.py", line 477, in read 
    newchars, decodedbytes = self.decode(data, self.errors) 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 18: invalid 
continuation byte 


import codecs 
import glob 

all =codecs.open("all.txt", "a", encoding="utf-8") 

for f in glob.glob("*.txt"): 
    print f 
    t = codecs.open(f, "r", encoding="utf-8") 
    all.write(t.read())

来源

2015-01-15 Chris

使用简单批处理命令连接文本文件的最佳方法。您可以简单地将文件一起添加，就好像它们是数字一样。 – PlamZ 2015-01-15 16:48:46

我怀疑这个错误可能与你打开'all.txt'两次有关。一旦将其分配给“全部”，并在另一次将其分配给循环时。 'all.txt'将匹配glob'“* .txt”'。 – 2015-01-15 16:51:48

@AlexBliskovsky我不认为会产生所描述的症状，但你是对的，这是一个错误*也*。 – zwol 2015-01-15 16:52:42

字母之间的“空格”可能表示至少某些文件使用utf-16编码。

如果所有文件使用相同的字符编码，那么您可以使用cat(1) command即将文件复制为字节（code example in Python 3）。下面是cat PowerShell command对应于Python代码：

PS C:\> Get-Content *.txt | Add-Content all.txt

不像cat *.txt >> all.txt;

from glob import glob 
from shutil import copyfileobj 

with open('all.txt', 'ab') as output_file: 
    for filename in glob("*.txt"): 
     with open(filename, 'rb') as file: 
      copyfileobj(file, output_file)

同样，所有文件应该有相同字符编码，否则你可能会在输出垃圾（混合量）：如果你使用的二进制文件模式It should not corrupt the character encoding.

您的代码应该工作。

来源

2015-01-15 23:15:04 jfs

输入文件可能是UTF编码，但你读它的ASCII，这将导致空间出现（反映空字节）。尝试：

import codecs 

... 

for f in glob.glob("*.txt"): 
    print f 
    t = codecs.open(f, "r", encoding="utf-16")

来源

2015-01-15 16:52:58

我得到了UnicodeDecodeError ...请看我的编辑 – Chris 2015-01-15 17:27:08

我看到你试过UTF-8。关于UTF-16呢？ – 2015-01-15 17:37:31

请运行该程序并编辑输出到您的问题（我们可能只需要看到输出的前五行，等等）。它以十六进制打印每个文件的前16个字节。这将帮助我们弄清楚发生了什么。

import glob 
import sys 

def hexdump(s): 
    return " ".join("{:02x}".format(ord(c)) for c in s) 

l = 0 
for f in glob.glob("*.txt"): 
    l = max(l, len(f)) 

for f in glob.glob("*.txt"): 
    with open(f, "rb") as fp: 
     sys.stdout.write("{0:<{1}} {2}\n".format(f, l, hexdump(fp.read(16))))

来源

2015-01-15 22:27:28 zwol

串联文本文件创建文件，每个字母

回答

相关问题