2014-09-28 177 views
0

我是python和SO的新手。这是我的问题。如何使用BeautifulSoup提取html标签之外的数据

我想从以下网页NDBC - Station 46011.我一直在看关于如何使用BeautifulSoup从网页收集数据的教程提取数据,我有以下代码至今:

import requests 
from bs4 import BeautifulSoup 
url = 'http://www.ndbc.noaa.gov/data/latest_obs/46011.rss' 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 
data_types = soup.find_all('strong') 
for item in data_types: 
    print(item.text) 

这给了我不同的数据类型(风向,速度,阵风等)。但是,我无法从此网页提取数字数据。当您查看网页来源时,您可以看到数字数据位于'strong'标签之后和'br'标签之前。由于它没有显式地位于两个标签之间,因此无法提取此数据。

感谢您提前提供所有帮助!

+1

你有看着如http://stackoverflow.com/q/8220732/3001761? – jonrsharpe 2014-09-28 08:02:24

回答

0
import requests 
from bs4 import BeautifulSoup 
url = 'http://www.ndbc.noaa.gov/data/latest_obs/46011.rss' 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 
data_types = soup.find_all("description")[1].text.split('\n') 
for item in data_types: 
    print(item) 

Out[1]: 
September 28, 2014 12:50 am PDT 
Location: 35N 120.992W 
Wind Direction: NW (320°) 
Wind Speed: 7.8 knots 
Wind Gust: 9.7 knots 
Significant Wave Height: 8.5 ft 
Dominant Wave Period: 9 sec 
Average Period: 6.7 sec 
Mean Wave Direction: NW (304°) 
Atmospheric Pressure: 29.90 in (1012.5 mb) 
Pressure Tendency: +0.00 in (+0.0 mb) 
Air Temperature: 62.1°F (16.7°C) 
Water Temperature: 59.9°F (15.5°C) 

希望帮助:-)

让我知道你是否需要采取进一步措施为好。

0

如果你只是想文本(这不是一个标签内)旁边的每个<strong>标签(和你确定一些文本<strong>后总是有),你可以操纵BeautifulSoup的contents名单。下面的代码给出了元组列表中数据项的标签和内容。

import requests 
from bs4 import BeautifulSoup 
url = 'http://www.ndbc.noaa.gov/data/latest_obs/46011.rss' 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 
contents = soup.find_all('description')[1].contents 
data=[] 
for i,content in enumerate(contents): 
    if content.name=="strong": 
     data.append((content.string,contents[i+1].string)) 
print data  

输出:

[(u'Location:', u' 35N 120.992W'), (u'Wind Direction:', u' NW (320\xb0)'), (u'Wind Speed:', u' 7.8 knots'), (u'Wind Gust:', u' 9.7 knots'), (u'Significant Wave Height:', u' 8.5 ft'), (u'Dominant Wave Period:', u' 9 sec'), (u'Average Period:', u' 6.7 sec'), (u'Mean Wave Direction:', u' NW (304\xb0) '), (u'Atmospheric Pressure:', u' 29.90 in (1012.5 mb)'), (u'Pressure Tendency:', u' +0.00 in (+0.0 mb)'), (u'Air Temperature:', u' 62.1\xb0F (16.7\xb0C)'), (u'Water Temperature:', u' 59.9\xb0F (15.5\xb0C)')] 
相关问题