需要每天用Python打开最新的PDF文件2.7

我正在编写一个脚本，每天都会在网页上打开最新的文件。到目前为止我的代码如下：需要每天用Python打开最新的PDF文件2.7

from BeautifulSoup import BeautifulSoup 
import urllib2 
import re 


html_page = urllib2.urlopen("http://www.baytown.org/city-hall/departments/police/daily-media-report") 
soup = BeautifulSoup(html_page) 
for link in soup.findAll('a', attrs={'href': 
re.compile("^/home/showdocument")}): 

     print link.get('href')

我的输出

/home/showdocument?id=7455 
/home/showdocument?id=7379 
/home/showdocument?id=7381 
/home/showdocument?id=7385 
/home/showdocument?id=7385 
/home/showdocument?id=7401 
/home/showdocument?id=7451 
/home/showdocument?id=7453

我需要阅读该列表中的最新的文件（最高ID＃）和Im卡住。我如何找到具有最高编号的文件并读取它？

来源

2017-09-26 Rod

我将所有的id号加入列表中，然后对列表进行排序以获得最高的id号。

代码：

import urllib2 
from bs4 import BeautifulSoup 
import re 

pdfs = [] 
html_page = urllib2.urlopen("http://www.baytown.org/city-hall/departments/police/daily-media-report") 
soup = BeautifulSoup(html_page, 'html.parser') 
for link in soup.findAll('a', attrs={'href': re.compile("^/home/showdocument")}): 
     pdfs.append(str(link.get('href')).split('id=')[1]) 
latest = sorted(pdfs)[-1] 
print "Latest PDF id = ", latest

输出：

Latest PDF id = 7455

来源

2017-09-26 04:28:20 Ali

这是它。谢谢 – Rod

由于采用了最新的PDF是百达第一的名单上：

latest = soup.findAll('a', attrs={'href': re.compile("^/home/showdocument")})[0]["href"].split('=')[1] 
print (latest)

，输出7455

来源

2017-09-26 08:02:34 Zroq

需要每天用Python打开最新的PDF文件2.7

回答

相关问题