正确地将输出格式化为文件

我正在解析URL并将它们保存到文件中。我的代码工作正常的Windows，但在Ubuntu它增加了一个小“U”每一行正确地将输出格式化为文件

import re 

reports = "C:\Users/_____/Desktop/Reports/" 
string = "Here is a string to test. http://www.blah.com & http://2nd.com" 
url_match = re.findall(r'(https?://[^\s]+)', string) 
print url_match 

if url_match != []: 
    with open(reports + "_URLs.txt", "a") as text_file: 
     text_file.write('{}'.format(url_match).replace(',', "\n").replace('[', '').replace(']', '').replace("'", '').replace(' ', '').__add__("\n"))

的前面有没有人对如何解决这种想法？谢谢

来源

2015-11-19 BeMy Friend

怎么样'text_file.write（“{}”格式（url_match）.replace（“”， “\ n”）。replace（'['，''）.replace（']'，''）.replace（''“，''）.replace（''，''）.__ add __（”\ n “）[1：]）（最后注意'[1：]'） – inspectorG4dget

''{}'。format（url_match）'就是'url_match'。 – TigerhawkT3

另外，您应该使用'+'而不是'.__ add __（）'。 – TigerhawkT3

'{}'.format(url_match)将url_match列表变成其人类可读的字符串，然后您使用一些复杂的字符串替换回到要写入的行的列表。沿着这条线你会得到一个unicode字符串，因此就是'u'。我不会去猜测为什么发生这种情况，因为真正的解决办法是只处理列表：

import re 

# reports = "C:\Users/_____/Desktop/Reports/" 
reports = "" # for test 
string = "Here is a string to test. http://www.blah.com & http://2nd.com" 
url_match = re.findall(r'(https?://[^\s]+)', string) 
print url_match 
if url_match: 
    with open(reports + "_URLs.txt", "a") as text_file: 
     for url in url_match: 
      text_file.write(url + '\n')

来源

2015-11-20 00:17:26 tdelaney

是的，这工作...谢谢！ >>“，然后OP试图用一堆替换来破解。” :-)我会到达那里。再次感谢 –

我并不是故意要这么苛刻！有时候，如果它看起来像一个彻头彻尾的黑客，它是一个好主意，回到原始数据，并寻找一种更干净的方式。 – tdelaney

此外，如果你发现你仍然有unicode数据进来，也许是因为输入文件有这个，或者你从剪贴板粘贴，你可能想插入'url_match = [item.encode（'ascii'，'ignore'））对于url_match中的项目]' – jeedo

正确地将输出格式化为文件

回答

相关问题