修复Python 3.5上的FTP网页抓取脚本

-1

我想从FTP服务器提取文本文件。这是我已经有的代码：修复Python 3.5上的FTP网页抓取脚本

from ftplib import FTP 
import re 

def my_function(data): 
    print(data) 

ftp = FTP('ftp.nasdaqtrader.com') 
ftp.login() 
nasdaq=ftp.retrbinary('RETR /SymbolDirectory/nasdaqlisted.txt', my_function) 
#nasdaq contains the text file

我遇到了一些这种方法的问题。例如，每次运行脚本时，都会打印出我真正不想要的内容，我只需要将变量“nasdaq”存储为字符串即可。此外，尽管“纳斯达克”打印出该行：

b'Symbol|Security Name|Market Category|Test Issue|Financial Status|Round Lot Size|ETF|NextShares\r\nAAAP|Advanced Accelerator Applications S.A. - American Depositary Shares

我不能证明它是在“纳斯达克”：

print ("\r\nAAAP|Advanced Accelerator Applications S.A." in nasdaq) 
Out: False

这将是一个更Python的方法呢？

来源

2017-01-16 Rafael Martínez

因为'str'不支持缓冲接口，所以你不能'print（“\ r \ nAAAP |纳斯达克的高级加速器应用程序S.A.），因为它会引发TypeError， – Juggernaut

这实质上是Is it possible to read FTP files without writing them using Python?的副本，但我想说明如何针对您的情况实施它。

from ftplib import FTP 
from io import BytesIO 

data = BytesIO() 
with FTP("ftp.nasdaqtrader.com") as ftp: # use context manager to avoid 
    ftp.login()       # leaving connection open by mistake 
    ftp.retrbinary("RETR /SymbolDirectory/nasdaqlisted.txt", data.write) 
data.seek(0) # need to go back to the beginning to get content 
nasdaq = data.read().decode() # convert bytes back to string

nasdaq现在应该是包含指定文件的内容的字符串，\r\n Windows风格的行尾。如果你在这两个字符上输入.split()，你会得到一个列表，每行代表一个组件。

来源

2017-01-16 20:29:31 MattDMo

修复Python 3.5上的FTP网页抓取脚本

回答

相关问题