0
我想尝试一些新的练习网络报废。我正尝试在网站上登录,然后刮取特定项目。Python的Scrapy:登录到一个网站,然后刮
我已经为此构建了此代码,但它不起作用。我使用scrapy.FormRequest
登录,用什么我从文件读取到目前为止,我有以下代码设置:
class HomelyspiderSpider(scrapy.Spider):
name = "homelyspider"
allowed_domains = ["homely.com.au"]
start_urls = ['https://homely.com.au/']
def parse(self, response):
yield scrapy.FormRequest.from_response(
response,
formxpath='.//div[@class="Modal-body"]/form',
formdata={
'usernameOrEmail': 'myusername',
'password': 'mypassword',
},
clickdata = { "type": "Submit" },
callback=self.after_login
)
def after_login(self, response):
"DO SCRAPING NOW"
登录页面HTML:
<div class="Auth Auth--modal">
<div class="signin ">
<div class="Modal-header">
<h1 class="Modal-title">Sign in</h1>
</div>
<div class="Modal-body">
<p class="subtitle">Instant sign in with Facebook or Google:</p><a class="Button Button--icon Button--facebook small-12" href="/authentication/redirect/Facebook"><span role="presentation" class="icon-wrapper"><svg class="icon icon-facebook"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-facebook"></use></svg></span><span class="label">Continue with Facebook</span></a><a class="Button Button--icon Button--google small-12" href="/authentication/redirect/Google"><span role="presentation" class="icon-wrapper"><svg class="icon icon-google"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-google"></use></svg></span><span class="label">Continue with Google</span></a>
<p>or using your email:</p>
<form>
<label class="">
<input type="text" aria-label="Email or Username" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="usernameOrEmail" placeholder="Email or Username" class="FormControl" value="">
</label>
<label class="">
<input type="password" aria-label="Password" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="password" placeholder="Password" class="FormControl">
</label>
<button class="Button Button--alt small-12" type="submit"><span class="Button-message">Sign In</span>
</button>
</form>
<p class="forgotten">
<button class="ButtonLink">Forgot Password?</button>
</p>
</div>
<div class="Modal-line"></div>
<div class="Modal-footer">
<p>
<!-- react-text: 71 -->Not yet a member?
<!-- /react-text -->
<button class="ButtonLink">Register with Homely</button>
</p>
</div>
</div>
</div>
我知道这是无关紧要的因为from是在页面中,但我仍然在显示提供链接的步骤和元素。
这是主页,我必须点击登录:
然后是登录弹出包含表单代码,我先前已提供:
我在这里做错了什么?从我所了解的scrapy DOCs,我的scrapy表单请求代码应该工作,对吧?
我可以看到,太..你能告诉为什么?形式xpath是好的 –
不,因为我也得到错误,当使用XPath不知道为什么 – minime
我现在看到问题的形式不显示,直到我点击登录按钮 –