2017-07-27 114 views
2

我试图从这个html块中取出'id'或'data-value',并将它们分配给一个列表。似乎并不像我指定正确的目标。我哪里错了?最终我希望针对is_in_stock部分中的各个产品ID。Python:从html获取“id”或“data-value”?

我的代码 -

import requests 
from bs4 import BeautifulSoup as bs 

response = session.get(product_url) 
soup = bs(response.text,'lxml') 
div = soup.find("div",{"class":"item"}) 
all_sizes = div.find_all("data") 

HTML的

             <div class="product-options" id="product-options-wrapper"> 
<script type="text/javascript"> 
        try { 
         var changeConfigurableStatus = true; 
         var stStatus = new StockStatus({"242":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92964"},"246":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92965"},"363":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92966"},"248":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92967"},"243":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92968"},"368":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92969"},"244":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92970"},"247":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92971"},"79":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92972"},"249":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92973"}}); 
        } 
         catch(ex){} 
       </script> 
     <div class="configurable-product-option no-display"> 
     <div class="configurable-product-option-wrapper"> 
      <h2>Please select your size</h2> 
      <div class="drop-select"> 
       <label for="attribute139"></label> 
       <select name="super_attribute[139]" 
         id="attribute139" 
         class="required-entry super-attribute-select"> 
        <option>Choose an Option...</option> 
       </select> 
      </div> 
     </div> 
    </div> 
    <script type="text/javascript"> 
    var spConfig = new Product.Config({"attributes":{"139":{"id":"139","code":"eu_size","label":"EU ","options":[{"id":"242","label":"EU 40 2\/3 \/ US 7.5","price":"0","oldPrice":"0","products":["92964"]},{"id":"246","label":"EU 41 1\/3 \/ US 8","price":"0","oldPrice":"0","products":["92965"]},{"id":"363","label":"EU 42 \/ US 8.5","price":"0","oldPrice":"0","products":["92966"]},{"id":"248","label":"EU 42 2\/3 \/ US 9","price":"0","oldPrice":"0","products":["92967"]},{"id":"243","label":"EU 43 1\/3 \/ US 9.5","price":"0","oldPrice":"0","products":["92968"]},{"id":"368","label":"EU 44 \/ US 10","price":"0","oldPrice":"0","products":["92969"]},{"id":"244","label":"EU 44 2\/3 US 10.5","price":"0","oldPrice":"0","products":["92970"]},{"id":"247","label":"EU 45 1\/3 \/ US 11","price":"0","oldPrice":"0","products":["92971"]},{"id":"79","label":"EU 46 \/ US 11.5","price":"0","oldPrice":"0","products":["92972"]},{"id":"249","label":"EU 46 2\/3 \/ US 12","price":"0","oldPrice":"0","products":["92973"]}]}},"template":"\u20ac#{price}","basePrice":"89","oldPrice":"89","productId":"90522","chooseText":"Choose an Option...","taxConfig":{"includeTax":true,"showIncludeTax":true,"showBothPrices":false,"defaultTax":19,"currentTax":19,"inclTaxTitle":"Incl. Tax"}}); 
</script> 

<h3>Choose size</h3> 
<div class="clearfix " data-attribute="attribute139" > 
       <div class="attribute-item " 
     data-value="242"> 
     EU 40 2/3/US 7.5  </div> 
       <div class="attribute-item " 
     data-value="246"> 
     EU 41 1/3/US 8  </div> 
       <div class="attribute-item " 
     data-value="363"> 
     EU 42/US 8.5  </div> 
       <div class="attribute-item " 
     data-value="248"> 
     EU 42 2/3/US 9  </div> 
       <div class="attribute-item " 
     data-value="243"> 
     EU 43 1/3/US 9.5  </div> 
       <div class="attribute-item " 
     data-value="368"> 
     EU 44/US 10  </div> 
       <div class="attribute-item " 
     data-value="244"> 
     EU 44 2/3 US 10.5  </div> 
       <div class="attribute-item " 
     data-value="247"> 
     EU 45 1/3/US 11  </div> 
       <div class="attribute-item " 
     data-value="79"> 
     EU 46/US 11.5  </div> 
       <div class="attribute-item " 
     data-value="249"> 
     EU 46 2/3/US 12  </div> 
    </div> 

回答

1

你在正确的轨道上,但是您需要tag.find_all而不是find

ids = [] 
for div in soup.find_all("div", {"class":"attribute-item"}): 
    ids.append(x['data-value']) 
+0

感谢您的冷速,我将如何获取“产品”ID在“var spConfig = new Product.Config({”attributes“:{”139“:{”id “:”139“,”code“:”eu_size“,”label“:”EU“,”options“:[{”id“:”242“,”label“:”EU 40 2 \/3 \/US 7.5“,”price“:”0“,”oldPrice“:”0“,”products“:[”92964“]},}})”string –

+0

@duchathaway'soup.find('script').text' –

+0

@duchathaway如果这个答案有帮助,你可以_accept_它。点击帮助答案旁边的灰色检查,它会变成绿色。它帮助每个人:) –

1

这应该为你工作。

import requests 
from bs4 import BeautifulSoup as bs 

response = session.get(product_url) 
soup = bs(response.text,'lxml') 

div = soup.find_all("div",{"class":"attribute-item"}) # Select the divs with .attribute-item class 
all_sizes = [x['data-value'] for x in div] # Extract the 'data-value' attribute from all the divs with .attribute-item 
+0

谢谢你magoon,我会怎么瞄准获得“产品” ID在the-“变种spConfig = new Product.Config({“attributes”:{“139”:{“id”:“139”,“code”:“eu_size”,“label”:“EU”,“options”:[{{ “id”:“242”,“label”:“EU 40 2 \/3 \/US 7.5”,“price”:“0”,“oldPrice”:“0”,“products”:[“92964”] },}})“string? –

+1

@duchathaway你需要从javascript的那一行中获得它,或者你可以从它的'