2017-09-03 146 views
0

[免责声明]我已经通过了该地区的其他许多答案,但他们似乎并不适合我。将BeautifulSoup的数据导出到CSV

我想能够导出我已经抓取的数据作为CSV文件。

我的问题是如何编写将数据输出到CSV的代码片段?从代码

目前代码

import requests 
from bs4 import BeautifulSoup 

url = "http://implementconsultinggroup.com/career/#/6257" 
r = requests.get(url) 

req = requests.get(url).text 
soup = BeautifulSoup(r.content) 
links = soup.find_all("a") 

for link in links: 
    if "career" in link.get("href") and 'COPENHAGEN' in link.text: 
      print "<a href='%s'>%s</a>" %(link.get("href"), link.text) 

输出

View Position 

</a> 
<a href='/career/management-consultants-to-help-our-customers-succeed-with- 
it/'> 
Management consultants to help our customers succeed with IT 
COPENHAGEN • At Implement Consulting Group, we wish to make a difference in 
the consulting industry, because we believe that the ability to create Change 
with Impact is a precondition for success in an increasingly global and 
turbulent world. 




View Position 

</a> 
<a href='/career/management-consultants-within-process-improvement/'> 
Management consultants within process improvement 
COPENHAGEN • We are looking for consultants with profound 
experience in Six Sigma, Lean and operational 
management 

代码我已经试过

with open('ImplementTest1.csv',"w") as csv_file: 
    writer = csv.writer(csv_file) 
    writer.writerow(["link.get", "link.text"]) 
    csv_file.close() 

输出CSV格式

1列:URL链接

2列:职位描述

1列:/职业/管理咨询顾问对求助我们,客户-succeed - 与 - 它/

2列:管理顾问,以帮助我们的客户提供IT成功 哥本哈根•在实施顾问集团,我们希望因为我们认为在影响力创造变化 的能力是在日益全球化和动荡的世界取得成功的先决条件。

+0

你必须存储在列表中的结果。 –

+0

感谢Adam。我对Python很陌生,你能快速展示如何创建/存储结果作为列表吗? –

+0

这里是我对类似问题的回答:[extract-data-from-html-to-csv-using-beautifulsoup](https://stackoverflow.com/questions/45675705/extract-data-from-html-to- csv-using-beautifulsoup/45676970#45676970) –

回答

1

试试这个脚本,并获得CSV输出:

import csv ; import requests 
from bs4 import BeautifulSoup 

outfile = open('career.csv','w', newline='') 
writer = csv.writer(outfile) 
writer.writerow(["job_link", "job_desc"]) 

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text 
soup = BeautifulSoup(res,"lxml") 
links = soup.find_all("a") 

for link in links: 
    if "career" in link.get("href") and 'COPENHAGEN' in link.text: 
     item_link = link.get("href").strip() 
     item_text = link.text.replace("View Position","").strip() 
     writer.writerow([item_link, item_text]) 
     print(item_link, item_text) 
outfile.close() 
+0

谢谢Shahin - 这工作正如我想要的。唯一的功能不工作它的最后一块:outfile。关闭(): 文件“”,第7行 outfile.close() ^ 语法错误:无效的语法 –

+0

这是因为我想,你使用Python 2,而我使用Python 3,我不知道,但!但是,它的运行完美无瑕。 – SIM