如何使用python2.7从网站获取特定文本

-2

如果用户需要歌曲的歌词，我想创建一个从网站收集简单文本的简单程序，该如何让程序收集它。如何使用python2.7从网站获取特定文本

https://www.azlyrics.com/lyrics/runthejewels/closeyoureyesandcounttofuck.html 如何从本网站收集歌词部分？

2017-09-04 Davy Jones

检查类似beautifulsoup，例如lxml，请求或scrapy。 – eLRuLL

您可以使用requests来获取HTML，然后使用BeautifulSoup来解析它。以下内容会在HTML开头的歌词开始前查找HTML注释，然后找到包含它的父项<div>。从该文本可以提取：

import requests 
from bs4 import BeautifulSoup, Comment 

r = requests.get("https://www.azlyrics.com/lyrics/runthejewels/closeyoureyesandcounttofuck.html", headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36'}) 
soup = BeautifulSoup(r.content, "html.parser") 

for comment in soup.find_all(string=lambda text:isinstance(text, Comment)): 
    if "Usage of azlyrics.com content" in comment: 
     print comment.parent.text

这将会给你的东西出发：

[Zack De La Rocha:] 
Run them jewels fast, run them, run them jewels fast 
...

如果需要如下这些库可以安装：

pip install beautifulsoup4 
pip install requests

来源

2017-09-04 16:27:28

非常感谢.... :) –

如何使用python2.7从网站获取特定文本

回答

相关问题