2017-05-25 110 views
0

我想尝试一些新的练习网络报废。我正尝试在网站上登录,然后刮取特定项目。Python的Scrapy:登录到一个网站,然后刮

我已经为此构建了此代码,但它不起作用。我使用scrapy.FormRequest登录,用什么我从文件读取到目前为止,我有以下代码设置:

class HomelyspiderSpider(scrapy.Spider): 
    name = "homelyspider" 
    allowed_domains = ["homely.com.au"] 
    start_urls = ['https://homely.com.au/'] 

    def parse(self, response): 

      yield scrapy.FormRequest.from_response(
       response, 
       formxpath='.//div[@class="Modal-body"]/form', 
       formdata={ 
        'usernameOrEmail': 'myusername',    
        'password': 'mypassword', 
       }, 
       clickdata = { "type": "Submit" }, 
       callback=self.after_login 
      ) 
    def after_login(self, response): 
      "DO SCRAPING NOW" 

登录页面HTML

<div class="Auth Auth--modal"> 
    <div class="signin "> 
     <div class="Modal-header"> 
      <h1 class="Modal-title">Sign in</h1> 
     </div> 
     <div class="Modal-body"> 
      <p class="subtitle">Instant sign in with Facebook or Google:</p><a class="Button Button--icon Button--facebook small-12" href="/authentication/redirect/Facebook"><span role="presentation" class="icon-wrapper"><svg class="icon icon-facebook"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-facebook"></use></svg></span><span class="label">Continue with Facebook</span></a><a class="Button Button--icon Button--google small-12" href="/authentication/redirect/Google"><span role="presentation" class="icon-wrapper"><svg class="icon icon-google"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-google"></use></svg></span><span class="label">Continue with Google</span></a> 
      <p>or using your email:</p> 
      <form> 
       <label class=""> 
        <input type="text" aria-label="Email or Username" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="usernameOrEmail" placeholder="Email or Username" class="FormControl" value=""> 
       </label> 
       <label class=""> 
        <input type="password" aria-label="Password" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="password" placeholder="Password" class="FormControl"> 
       </label> 
       <button class="Button Button--alt small-12" type="submit"><span class="Button-message">Sign In</span> 
       </button> 
      </form> 
      <p class="forgotten"> 
       <button class="ButtonLink">Forgot Password?</button> 
      </p> 
     </div> 
     <div class="Modal-line"></div> 
     <div class="Modal-footer"> 
      <p> 
       <!-- react-text: 71 -->Not yet a member? 
       <!-- /react-text --> 
       <button class="ButtonLink">Register with Homely</button> 
      </p> 
     </div> 
    </div> 
</div> 

我知道这是无关紧要的因为from是在页面中,但我仍然在显示提供链接的步骤和元素。

这是主页,我必须点击登录

enter image description here

enter image description here

然后是登录弹出包含表单代码,我先前已提供:

enter image description here

我在这里做错了什么?从我所了解的scrapy DOCs,我的scrapy表单请求代码应该工作,对吧?

回答

0

ValueError异常:未在> 它没有找到的形式找到的元素...

+0

我可以看到,太..你能告诉为什么?形式xpath是好的 –

+0

不,因为我也得到错误,当使用XPath不知道为什么 – minime

+0

我现在看到问题的形式不显示,直到我点击登录按钮 –