0
我想从两个html页面提取数据。当我从一个页面提取数据并转到另一个页面时,某些元素会发生更改,数据会出现在列表和列表更改中。如何从两个html页面提取数据?
我下面的问题
details_containers = soup_page.findAll("div",{"id":"RESTAURANT_DETAILS"})
details_container = details_containers[0].findAll("div",{"class":"content"})
cuisine = details_container[0].text.strip()
print(cuisine)
meals = details_container[1].text.strip()
print(meals)
hotel_features = details_container[2].text.strip()
print(hotel_features)
从第一HTML代码我想要的美食,餐饮,retaurant_features含量值。但是有一些额外的小时数,平均价格的内容值。
<div id="RESTAURANT_DETAILS" class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<a href="/UpdateListing-g297595-d6384395-Ocellus-Raipur_Raipur_District_Chhattisgarh.html" onclick="ta.setEvtCookie('UpdateListing', 'entry-detail-moreinfo', null, 0, '/UpdateListingRedesign')">
<div class="improve_listing_btn ui_button primary">Improve this listing</div>
</a>
<h3 class="tabs_header">Restaurant Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating summary
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Food</span>
</div>
<div class="wrap row part ">
<span class="ui_bubble_rating bubble_45" alt="4.5 of 5 bubbles"></span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="ui_bubble_rating bubble_35" alt="3.5 of 5 bubbles"></span>
</div>
</div>
</li>
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Value</span>
</div>
<div class="wrap row part ">
<span class="ui_bubble_rating bubble_35" alt="3.5 of 5 bubbles"></span>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
<div class="row">
<div class="title">
Average prices
</div>
<div class="content">
<span>₹ 448 -
₹ 768</span>
</div>
</div>
<div class="row">
<div class="title">
Cuisine
</div>
<div class="content">
<a href="/Restaurants-g297595-c24-Raipur_Raipur_District_Chhattisgarh.html">Indian</a>, <a href="/Restaurants-g297595-c3-Raipur_Raipur_District_Chhattisgarh.html">Asian</a>, <a href="/Restaurants-g297595-c26-Raipur_Raipur_District_Chhattisgarh.html">Italian</a>, <a href="/Restaurants-g297595-c20-Raipur_Raipur_District_Chhattisgarh.html">French</a>, <a href="/Restaurants-g297595-c11-Raipur_Raipur_District_Chhattisgarh.html">Chinese</a>, <a href="/Restaurants-g297595-c22-Raipur_Raipur_District_Chhattisgarh.html">International</a>, <a href="/Restaurants-g297595-zfz10665-Raipur_Raipur_District_Chhattisgarh.html">Vegetarian Friendly</a>
</div>
</div>
<div class="row">
<div class="title">
Meals
</div>
<div class="content">
Breakfast, Lunch, Dinner, Brunch
</div>
</div>
<div class="row">
<div class="title">
Restaurant features
</div>
<div class="content">
Reservations, Seating, Takeout, Private Dining, Waitstaff
</div>
</div>
<div class="row">
<div class="title">
Good for
</div>
<div class="content">
Groups, Business meetings, Child-friendly
</div>
</div>
<div class="row">
<div class="hours title">
Open Hours
</div>
<div class="hours content">
<div class="detail">
<span class="day">Sunday</span>
<span class="hours"><div class="hoursRange">07:00 - 23:00</div></span>
</div>
<div class="detail">
<span class="day">Monday</span>
<span class="hours"><div class="hoursRange">07:00 - 23:00</div></span>
</div>
<div class="detail">
<span class="day">Tuesday</span>
<span class="hours"><div class="hoursRange">07:00 - 23:00</div></span>
</div>
<div class="detail">
<span class="day">Wednesday</span>
<span class="hours"><div class="hoursRange">07:00 - 23:00</div></span>
</div>
<div class="detail">
<span class="day">Thursday</span>
<span class="hours"><div class="hoursRange">07:00 - 23:00</div></span>
</div>
<div class="detail">
<span class="day">Friday</span>
<span class="hours"><div class="hoursRange">07:00 - 23:00</div></span>
</div>
<div class="detail">
<span class="day">Saturday</span>
<span class="hours"><div class="hoursRange">07:00 - 23:00</div></span>
</div>
</div>
</div>
</div>
<div class="additional_info">
<div class="title">
Location and Contact Information </div>
<div class="content">
<ul class="detailsContent">
<li>
<div class="detail">Address:
<span> <span class="format_address"><span class="street-address">G.E. Road</span> | <span class="extended-address">Mayura Hotel</span>, <span class="locality">Raipur 492001, </span><span class="country-name">India</span> </span>
</span>
</div>
</li>
<li>
<div class="detail">Location:
<span> Asia</span>
<span> > India</span>
<span> > Chhattisgarh</span>
<span> > Raipur District</span>
<span> > Raipur</span>
</div>
</li>
<li>
<div class="detail">Phone Number:
<span>+91 77142 00500</span>
</div>
</li>
<li>
<span class="ui_icon email"></span>
<a target="_blank"" href="mailto:[email protected]" onclick="ta.trackEventOnPage('Eatery_Listing','Email','6384395')">
E-mail </a>
</li>
<!--trkP:waypoint_for_poi_2-->
<!-- PLACEMENT waypoint_for_poi -->
<div id="taplc_waypoint_for_poi_1" class="ppr_rup ppr_priv_waypoint_for_poi" data-placement-name="waypoint_for_poi">
</div>
<!--etk-->
</ul>
</div>
</div>
<!--[if lte IE 9]>
<style>
.details_block .threeColumnList{
height: 350px;
overflow: auto;
}
</style>
<![endif]-->
</div>
</div>
从第二HTML我想要的美食,餐饮,retaurant_features像上面的HTML内容的值。 但在这额外内容的小时值,平均价格不存在
<div id="RESTAURANT_DETAILS" class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<a href="/UpdateListing-g297595-d8595502-Barbeque_Nation-Raipur_Raipur_District_Chhattisgarh.html" onclick="ta.setEvtCookie('UpdateListing', 'entry-detail-moreinfo', null, 0, '/UpdateListingRedesign')">
<div class="improve_listing_btn ui_button primary">Improve this listing</div>
</a>
<h3 class="tabs_header">Restaurant Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating summary
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Food</span>
</div>
<div class="wrap row part ">
<span class="ui_bubble_rating bubble_45" alt="4.5 of 5 bubbles"></span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="ui_bubble_rating bubble_45" alt="4.5 of 5 bubbles"></span>
</div>
</div>
</li>
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Value</span>
</div>
<div class="wrap row part ">
<span class="ui_bubble_rating bubble_40" alt="4.0 of 5 bubbles"></span>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
<div class="row">
<div class="title">
Cuisine
</div>
<div class="content">
<a href="/Restaurants-g297595-c24-Raipur_Raipur_District_Chhattisgarh.html">Indian</a>, <a href="/Restaurants-g297595-c6-Raipur_Raipur_District_Chhattisgarh.html">Barbecue</a>, <a href="/Restaurants-g297595-c3-Raipur_Raipur_District_Chhattisgarh.html">Asian</a>, <a href="/Restaurants-g297595-zfz10665-Raipur_Raipur_District_Chhattisgarh.html">Vegetarian Friendly</a>, <a href="/Restaurants-g297595-zfz10697-Raipur_Raipur_District_Chhattisgarh.html">Vegan Options</a>, <a href="/Restaurants-g297595-zfz10992-Raipur_Raipur_District_Chhattisgarh.html">Gluten Free Options</a>
</div>
</div>
<div class="row">
<div class="title">
Meals
</div>
<div class="content">
Lunch, Dinner
</div>
</div>
<div class="row">
<div class="title">
Restaurant features
</div>
<div class="content">
Reservations, Seating, Waitstaff, Wheelchair Accessible, Validated Parking
</div>
</div>
<div class="row">
<div class="title">
Good for
</div>
<div class="content">
Groups, Special Occasion Dining, Kids, Child-friendly
</div>
</div>
</div>
<div class="additional_info">
<div class="title">
Location and Contact Information </div>
<div class="content">
<ul class="detailsContent">
<li>
<div class="detail">Address:
<span> <span class="format_address"> | <span class="extended-address">Magneto The Mall, 2nd Floor</span>, <span class="locality">Raipur 429010, </span><span class="country-name">India</span> </span>
</span>
</div>
</li>
<li>
<div class="detail">Location:
<span> Asia</span>
<span> > India</span>
<span> > Chhattisgarh</span>
<span> > Raipur District</span>
<span> > Raipur</span>
</div>
</li>
<li>
<div class="detail">Phone Number:
<span>+91 77160 60008</span>
</div>
</li>
<li>
<span class="ui_icon email"></span>
<a target="_blank"" href="mailto:[email protected]" onclick="ta.trackEventOnPage('Eatery_Listing','Email','8595502')">
E-mail </a>
</li>
<!--trkP:waypoint_for_poi_2-->
<!-- PLACEMENT waypoint_for_poi -->
<div id="taplc_waypoint_for_poi_1" class="ppr_rup ppr_priv_waypoint_for_poi" data-placement-name="waypoint_for_poi">
</div>
<!--etk-->
</ul>
</div>
</div>
<!--[if lte IE 9]>
<style>
.details_block .threeColumnList{
height: 350px;
overflow: auto;
}
</style>
<![endif]-->
</div>
</div>
如果可以,请缩进您的HTML。它只是让其他人更快地理解文档的结构。 – ContinuousLoad