2017-04-26 166 views
1

我想从此表中提取显示货币汇率的数据。从此表中获取数据html python

访问https://www.iceplc.com/travel-money/exchange-rates

我已经尝试过这种方法,但它不工作

 table_id = driver.find_element(By.ID, 
    'data_configuration_feeds_ct_fields_body0') 
     rows = table_id.find_elements(By.TAG_NAME, "tr") # get all of the 
     rows in the table 
     for row in rows: 

     col = row.find_elements(By.TAG_NAME, "td")[1] #note: index start from 
     0, 1 is col 2 
     print(col.text) #prints text from the element 

这是HTML

</td> 

        <td valign="top" class="OuterProdCell test"> 

           <table class="ProductCell"> 
            <tr> 
            <td class="rateCountryFlag"> 
             <ul id="prodImages"> 
              <li> 
               <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso" class="flags chilean-peso" ></a> 
              </li> 
             </ul> 
            </td> 

            <td class="ratesName"> 
            <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso"> 
            Chilean Peso</a> 
            </td> 

            <td class="ratesClass"> 
            <a class="orderText" href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso">774.8540</a> 
            </td> 
            <td class="orderNow">           
             <ul id="prodImages"> 
              <li> 
               <a class="reserveNow" href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso">Order<br/>now</a> 
              </li> 
              <li> 
               <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso" class="flags arrowGreen" ></a> 
              </li> 
             </ul> 
            </td> 
            </tr> 
           </table> 

我也试过蟒蛇硒的方法,但是我可以得到每一个的货币汇率,但不是名称

   driver.get("https://www.iceplc.com/travel-money/exchange- 
      rates") 
      rates = driver.find_elements_by_class_name("ratesClass") 

      for rate in rates: 
      print(rate.text) 
+0

哪个名字?什么是预期的输出? –

+0

输出欧元1.146 – xys234

+0

它的意思是以这种格式输出整个表格,排列顺序为 – xys234

回答

1

如果您只是想获得汇率,那么您最好使用API​​,请参阅this question。网页抓取会让您容易受到破坏您的代码的目标网页的更改影响。

如果刮是你的目标,但你只需要重用你的硒方法,但搜索“ratesName”类。

例如:

driver.get("https://www.iceplc.com/travel-money/exchange-rates") 
rates.append((driver.find_elements_by_class_name("ratesName"), driver.find_elements_by_class_name("ratesClass"))) 

for rate in rates: 
print("Name: %s, Rate: %s" % (rate[0], rate[1])) 
1

通过分析网页的结构,很明显,你必须按行来分析行,你必须选择列组件你有兴趣。

对于每一行提取您通过使用find_element_by_tag_namefind_element_by_class_name

(文档这里http://selenium-python.readthedocs.io/locating-elements.html

driver.get("https://www.iceplc.com/travel-money/exchange-rates") 
rates=driver.find_elements_by_tag_name('tr') 
for i in rates: 
     print i.find_element_by_class_name('ratesName').text, i.find_element_by_class_name('ratesClass').text 

输出感兴趣的两个要素:

US - Dollar 1.2536 
Croatia - Kuna 8.3997 
Canada - Dollar 1.7006 
Australia - Dollar 1.6647 
Euro - 1.1469 
...