我想通过python中的BeautifulSoup
库获取它的HTML后提取链接的标题。 基本上,整个标题标签使用BeautifulSoup从标题标签中提取数据?
<title>Imaan Z Hazir on Twitter: "Guantanamo and Abu Ghraib, financial and military support to dictators in Latin America during the cold war. REALLY, AMERICA? (3)"</title>
我想提取的数据是在& QUOT标签,这只是这个Guantanamo and Abu Ghraib, financial and military support to dictators in Latin America during the cold war. REALLY, AMERICA? (3)
我尝试作为
import urllib
import urllib.request
from bs4 import BeautifulSoup
link = "https://twitter.com/ImaanZHazir/status/778560899061780481"
try:
List=list()
r = urllib.request.Request(link, headers={'User-Agent': 'Chrome/51.0.2704.103'})
h = urllib.request.urlopen(r).read()
data = BeautifulSoup(h,"html.parser")
for i in data.find_all("title"):
List.append(i.text)
print(List[0])
except urllib.error.HTTPError as err:
pass
我也尝试作为
for i in data.find_all("title.""):
for i in data.find_all("title>""):
for i in data.find_all("""):
and
for i in data.find_all("quot"):
但是没有人在工作。
我期望BeautifulSoup将'"'转换成''',所以你只需要寻找'''' – zvone
@zvone这是什么? ''''你的意思是这个''标题<">“'? – Amar