2016-02-27 64 views
1

我有一个表,可以在这里找到:Ontario Gov Employee Directory,我试图通过表循环拉出数据,但努力找到xpath能够这样做。使用硒webdriver通过表循环

的表没有一个id,当我检查元素我看到:

<table title="results_list" border="0" width="100%" cellspacing="0" cellpadding="0"> 

    <tbody> 
    <tr> 
     <td class="content" valign="top" align="right" width="50">1. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("32528")'>Aagaard, Lindsay</a>] [ Senior Policy Advisor ] [TREASURY BOARD SECRETARIAT] 
     <br>[DEPUTY PREMIER AND PRESIDENT OF THE TREASURY BOARD, Toronto] 

     <!-- [416-327-0948] --> 



     [416-327-0948] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">2. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("34417")'>Aalto, Margaret</a>] [ Probation Officer ] [CHILDREN AND YOUTH SERVICES] 
     <br>[THUNDER BAY, Thunder Bay] 

     <!-- [807-475-1310] --> 



     [807-475-1310] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">3. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("9187")'>Aarlaht, Andrew</a>] [ Business Analyst ] [COMMUNITY AND SOCIAL SERVICES] 
     <br>[HAMILTON, BUSINESS SERVICES UNIT, Hamilton] 

     <!-- [905-521-7335] --> 



     [905-521-7335] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">4. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("9187")'>Aarlaht, Andrew</a>] [ Business Analyst ] [CHILDREN AND YOUTH SERVICES] 
     <br>[HAMILTON, BUSINESS SERVICES UNIT, Hamilton] 

     <!-- [905-521-7335] --> 



     [905-521-7335] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">5. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("19146")'>Aarons, Drew</a>] [ Messenger ] [LEGISLATIVE OFFICES] 
     <br>[PARLIAMENTARY PROTOCOL, Toronto] 

     <!-- [416-325-7455] --> 



     [416-325-7455] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">6. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("113729")'>Aaswaakshin, Neegann</a>] [ Articling Student ] [ABORIGINAL AFFAIRS] 
     <br>[LEGAL SERVICES, Toronto] 

     <!-- [416-212-2271] --> 



     [416-212-2271] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">7. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("32196")'>Abad, Lilian</a>] [ Executive Assistant ] [TRANSPORTATION] 
     <br>[GO TRANSIT, Toronto] 

     <!-- [416-202-5506] --> 



     [416-202-5506] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">8. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("114240")'>Abadesso, Jennifer</a>] [ Employment Program Consultant (Acting) ] [TRAINING, COLLEGES AND UNIVERSITIES] 
     <br>[FOUNDATION SKILLS, Toronto] 

     <!-- [416-327-2065] --> 



     [416-327-2065] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">9. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("104293")'>Abakunzi, Louis</a>] [ Customer Service Representative (Bilingual) ] [GOVERNMENT AND CONSUMER SERVICES] 
     <br>[SERVICEONTARIO CONTACT CENTRE - NORTH YORK, Toronto] 

     <!-- [416-235-2999] --> 



     [416-235-2999] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    <tr> 
     <td class="content" valign="top" align="right" width="50">10. &nbsp;</td> 
     <td class="content">[<a class="results" href='javascript:showEmployeeDetail("19309")'>Aban, Edencio</a>] [ Audit Supervisor ] [ATTORNEY GENERAL] 
     <br>[AUDIT AND COMPLIANCE, Toronto] 

     <!-- [416-326-6295] --> 



     [416-326-6295] [ 



     <a href="mailto:[email protected]"> 
                      [email protected]</a>] 
     </td> 
    </tr> 
    <tr> 
     <td>&nbsp;</td> 
    </tr> 

    </tbody> 
</table> 

我如何遍历这些行的数据?

回答

0

这是一个表格内的表格,然后有一些非常标准的格式。你有什么挑战?

的表没有一个id,当我检查元素我看到:

它具有其他属性就可以使用,如标题。使用xpath //table[@title="results_list"]/tbody/tr/td从最里面的表中查找每个数据元素。或者从xpath中删除最后一个/td以获取每一行。之后,找到它下面的每个td元素并使用它的text

注意:最里面的表格的第一列有序列号,第二列有实际的数据。我建议获取每个td,然后使用'innerHTML'属性或elem.text。之后,使用常规的exppresion来提取不同的部分。

>>> all_tdata = driver.find_elements_by_xpath('//table[@title="results_list"]/tbody/tr/td') 
>>> for td in all_tdata: 
...  print td.get_attribute('innerHTML') # save this in var and regex it 
...  # or 
...  data = td.text 
+0

谢谢!我做了类似的事情,我使用xpath和getText()函数来检索内部HTML,并相应地解析字符串。 @aneroid – kknaguib