解析HTTP返回断开链路

2017-10-18 127 views 1 likes

我'试图从一些电报信道解析图像，如实施例https://t.me/versusbattlerus，该图像是该块上 “IMG类=” tgme_page_photo_image” SRC =‘HTTPS：// ...’，但每次方法返回不同的，而不是建立工作联系，为什么会这样使用python 3.6，urllib的？我'，beautifulsoup4解析HTTP返回断开链路

方法

import urllib.request 
from bs4 import BeautifulSoup 


def get_html(url): 
    response = urllib.request.urlopen(url) 
    return response.read() 


def parse(html): 
    soup = BeautifulSoup(html, 'lxml') 
    image = soup.find('img', class_="tgme_page_photo_image") 
    print(image) 
    #return image 


def main(): 
    parse(get_html('https://t.me/versusbattlerus')) 


if __name__ == '__main__': 
    main()

来源

2017-10-18 T3h_vermili0n

这正常工作与我 – chad

回答

此脚本适用于我，请提供“破”链接测试

如果有错误，尝试简单的Linux壳牌的解决方案：

curl -s https://t.me/SeanChannel |grep -oP '"og:image" content="\K.+(?=")'

来源

2017-10-18 09:52:59 Sean

对不起，你的意思是我的脚本为你工作？所以你可以打开图像吗？ –

@ T3h_vermili0n是的，我可以打开它 – Sean

解析HTTP返回断开链路

回答

相关问题