parse.unquote_plus TypeError

我试图格式化文件，以便它可以插入到数据库中，该文件最初是压缩和大约1.3MB大。每一行看起来是这样的：parse.unquote_plus TypeError

398,%7EAnoniem+001%7E,543,480,7525010,1775,0

这是怎样的代码看起来像这样解析这个文件：

Village = gzip.open(Root+'\\data'+'\\' +str(Newest_Date[0])+'\\' +str(Newest_Date[1])+'\\' +str(Newest_Date[2])\ 
       +'\\'+str(Newest_Date[3])+' village.gz'); 
Village_Parsed = str 
for line in Village: 
    Village_Parsed = Village_Parsed + urllib.parse.unquote_plus(line); 
print(Village.readline());

当我运行程序我得到这个错误：

Village_Parsed = Village_Parsed + urllib.parse.unquote_plus(line); 
file "C:\Python31\lib\urllib\parse.py", line 404, in unquote_plus string = string.replace('+', ' ') TypeError: expected an object with the buffer interface

任何想法这里有什么不对？在此先感谢您的帮助:)

来源

2009-11-04 user202459

import gzip, os, urllib.parse 

archive_relpath = os.sep.join(map(str, Newest_Date[:4])) + ' village.gz' 
archive_path = os.path.join(Root, 'data', archive_relpath) 

with gzip.open(archive_path) as Village: 
    Village_Parsed = ''.join(urllib.parse.unquote_plus(line.decode('ascii')) 
          for line in Village) 
    print(Village_Parsed)

输出：

 
398,~Anoniem 001~,543,480,7525010,1775,0

注：RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax说：

This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.

在line.decode('ascii')片段因此'ascii'应该由任何字符编码，你已经习惯你的文字编码来代替。

来源

2009-11-04 10:39:05 jfs

@JFSebastian：你真的尝试过吗？我得到和OP一样的错误...除了他的初始化问题，你的代码看起来在功能上等同于他的返回字节对象。 – 2009-11-04 11:11:08

@John Machin：我试过了（现在）。我找不到'unquote_plus_from_bytes'，所以我们不得不求助于显式的'bytes.decode'方法。 – jfs 2009-11-04 11:19:27

谢谢，您的解决方案效果很好，谢谢您指出我的其他错误（Machin和Sebestian）。我不确定ascii是否是使用过的字符编码，但据我所知，它没有任何问题。 – user202459 2009-11-08 05:40:11

问题1是urllib.unquote_plus不喜欢你喂它的line。该消息应该是“请提供一个STR对象” :-)我建议你解决问题2以下，并插入：

print('line', type(line), repr(line))

后立即您for语句，这样你可以看到你在line得到什么。

你会发现，它返回字节对象：

>>> [line for line in gzip.open('test.gz')] 
[b'nudge nudge\n', b'wink wink\n']

使用的“R”的模式有很少的效果：

>>> [line for line in gzip.open('test.gz', 'r')] 
[b'nudge nudge\n', b'wink wink\n']

我建议，而不是传递line的分析例程你通过line.decode('UTF-8') ...或编写gz文件时使用的任何编码。

问题2是在这条线：

Village_Parsed = str

str是一种类型。你需要一个空str对象。为了得到这一点，你可以调用即str()其类型在形式上是正确的，但不切实际/异常/ scoffable /怪异相比，使用字符串常量''时候......所以这样做：

Village_Parsed = ''

你也有问题3 ：您的最后一条语句是在EOF之后尝试读取gz文件。

来源

2009-11-04 10:05:50

parse.unquote_plus TypeError

回答

相关问题