获取与BeautifulSoup和Python

我想使用Python和美丽的汤，提取下面的标签的内容部分meta标签的内容属性：获取与BeautifulSoup和Python

<meta property="og:title" content="Super Fun Event 1" /> 
<meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />

我越来越BeautifulSoup加载页面就好了找到其他的东西（这也抓住了隐藏在源代码中的ID标签的文章ID），但我不知道正确的方式来搜索HTML和找到这些位，我试过变种find和findAll无济于事。该代码遍历当前的URL列表...

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

#importing the libraries 
from urllib import urlopen 
from bs4 import BeautifulSoup 

def get_data(page_no): 
    webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read() 
    soup = BeautifulSoup(webpage, "lxml") 
    for tag in soup.find_all("article") : 
     id = tag.get('id') 
     print id 
# the hard part that doesn't work - I know this example is well off the mark!   
    title = soup.find("og:title", "content") 
    print (title.get_text()) 
    url = soup.find("og:url", "content") 
    print (url.get_text()) 
# end of problem 

for i in range (1,100): 
    get_data(i)

如果有人能帮助我整理了一下，找到了OG：标题和OG：内容会是太棒了！

来源

2016-04-21 the_t_test_1

作为第一个参数find()提供的meta标签名。然后，使用关键字参数来检查的特定属性：如果你知道的标题和URL元属性将始终存在

title = soup.find("meta", property="og:title") 
url = soup.find("meta", property="og:url") 

print(title["content"] if title else "No meta title given") 
print(url["content"] if url else "No meta url given")

的if/else这里检查将是可选的。

来源

2016-04-21 11:42:10 alecxe

有没有内置的获取内容，否则退回到默认？ –

@ChristopheRoussy是的，这正是答案中所显示的。另外，你可以通过使用'soup.find（“meta”，property =“og：title”，content = True）'来加强'content'属性的存在。谢谢。 – alecxe

试试这个：

soup = BeautifulSoup(webpage) 
for tag in soup.find_all("meta"): 
    if tag.get("property", None) == "og:title": 
     print tag.get("content", None) 
    elif tag.get("property", None) == "og:url": 
     print tag.get("content", None)

来源

2016-04-21 11:37:18 Hackaholic

请问后续问题？

我想用bs4得到<meta name='keywords' content=''></>，而是得到一行结果我得到了整个元块。你碰巧知道为什么？

解析的网站：https://www.bilibili.com/video/av6862467/#page=4

目标块：

<meta name="keywords" content="【SNH48】20161028 原创公演 TeamX《梦想的旗帜》首演 全场 CUT,娱乐,明星,SNH48-TeamX应援会,,哔哩哔哩,Bilibili,B站,弹幕" />

代码：

metatags = soup.find_all('meta',attrs={'name':'keywords'})                
for tag in metatags: 
    print(tag)

来源

2017-12-20 05:19:30 CrazyFrog

获取与BeautifulSoup和Python

回答

相关问题