0
我试图从上市,只能通过点击“视图”按钮来触发此表单提交查看详细信息页面的内容拼凑而成。我是新来的Python和Scrapy使用Scrapy刮表格后提交数据
示例标记
<li><h3>Abc Widgets</h3>
<form action="/viewlisting?id=123" method="post">
<input type="image" src="/images/view.png" value="submit" >
</form>
</li>
我的Scrapy的解决方案是提取表单操作,然后使用请求与回调返回页面解析它为想要的内容。不过,我已经打了几个问题
我得到以下错误“请求的URL必须是海峡或Unicode”
其次,当我硬编码的URL来克服上述问题,看来我的解析函数返回什么看起来像一个列表
这里是我的代码 - 与真实的URL的反应
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from wfi2.items import Wfi2Item
class ProfileSpider(Spider):
name = "profiles"
allowed_domains = ["wfi.com.au"]
start_urls = ["http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=WA",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=VIC",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=QLD",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=NSW",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=TAS"
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=NT"
]
def parse(self, response):
hxs = Selector(response)
forms = hxs.xpath('//*[@id="area-managers"]//*/form')
for form in forms:
action = form.xpath('@action').extract()
print "ACTION: ", action
#request = Request(url=action,callback=self.parse_profile)
request = Request(url=action,callback=self.parse_profile)
yield request
def parse_profile(self, response):
hxs = Selector(response)
profile = hxs.xpath('//*[@class="contentContainer"]/*/text()')
print "PROFILE", profile
感谢您的明确的解释,并调用了文档的相关章节 – htmlr