Python，BeautifulSoup解析表

以下是内容和代码。

txt = ''' 
<head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head><table><tr><th filter=all>Employee Name</th><th filter=all>Project  Name</th><th filter=all>Area</th><th filter=all>Date</th><th filter=all>Employee  Manager</th></tr> 
<tr><td style="vnd.ms-excel.numberformat:@">David</td><td style="vnd.ms- excel.numberformat:@">Review-2016</td><td style="vnd.ms- excel.numberformat:@">US</td><td align=right>17/03/2016</td><td style="vnd.ms- excel.numberformat:@">Andrew</td></tr> 
<tr><td style="vnd.ms-excel.numberformat:@">Kate</td><td style="vnd.ms-excel.numberformat:@">Review 2016</td><td style="vnd.ms-excel.numberformat:@">UK</td><td align=right>21/03/2016</td><td style="vnd.ms-excel.numberformat:@">Liz</td></tr>

''”

soup = BeautifulSoup(txt, "lxml") 
soup.prettify() 

list_5 = soup.find_all('table')[0].find_all("tr") 

for row in list_5: 
    for nn in row.find_all("td"): 
     print nn.text

到目前为止文本都得到了，但所有在了一起，即：

David 
Review-2016 
US 
17/03/2016 
Andrew 
Kate 
Review 2016 
UK 
21/03/2016 
Liz

现在需要的是在列的形式，像大卫，凯特还是美国，英国等。

你能以正确的方式帮助我吗？谢谢。

来源

2017-04-17 Mark K

如果你想打印David, Kate，下面的代码将工作：

for row in list_5[1:]: 
     print(row.find_all('td')[0].text) 
#change find_all('td')[0] to find_all('td')[2] will print US UK

来源

2017-04-17 03:37:32 nick

能否请你帮我解决我的一个类似的问题。这里：http://stackoverflow.com/questions/43033378/web-scraping-with-selenium-python-twitter-instagram –

你确定吗？我发现问题已经解决了。 – nick

该解决方案是部分提供的，因为我可以获得部分解决方案，而另一部分解析输出到数据框是我的一大挑战。 –

Python，BeautifulSoup解析表

回答

相关问题