2016-06-07 177 views
1

由于几天我尝试登录到www.onlydomains.com网站检索我的域名列表到脚本。 我已经有这样的事情:Python 2.7,请求,登录到onlydomains.com网站

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import requests, sys, re, whois 
from bs4 import BeautifulSoup 

def onlydomains(): 
    with requests.Session() as c: 
     PASSWORD = 'my%password' 
     USERNAME = 'my_username' 
     URL = 'https://www.onlydomains.com/account/login' 
     c.get(URL) 
     soup = BeautifulSoup(c.get(URL).text, "lxml") 

     csrf = soup.find("input", value=True)["value"] 

    login_data = { 
     'csrfToken' : csrf, 
     'username' : USERNAME, 
     'password' : PASSWORD, 
     'submit' : 'Submit',} 

    r = c.post(URL, data=login_data, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'}) 
    r = c.get('https://onlydomains.secure-admin.com/domain/index') 
    print r.text 

onlydomains() 

而且它不会为我工作,因为我总是得到

> ./onlydomains.py 

    <!DOCTYPE html><html lang="en"><head><meta charset="utf-8" /><title>Login/Sign Up - OnlyDomains</title> 

任何想法我做错了什么?

回答

1

如果你看一下从后回来,你可以看到一个window.location = some_url

<script type="text/javascript"> 
       $(document).ready(function(){ 

        setTimeout(function(){ 

          window.location = 'https://onlydomains.secure-admin.com/dashboard/index?_srs_=v42oadi4cAuxIM4PHc5IdgU%5CdXd3AjswsOraTLjynso%3D';; 


        },1000); 
       }); 
      </script> 

可以用它来获取页面:

patt = re.compile("window.location\s+=\s+'(http.*)'") 

    with requests.Session() as s: 
     PASSWORD = 'user' 
     USERNAME = "pass" 
     URL = 'https://www.onlydomains.com/account/login' 
     soup = BeautifulSoup(s.get(URL).text, "lxml") 
     csrf = soup.select_one("input[name=csrfToken]")["value"] 

    login_data = { 
     'csrfToken' : csrf, 
     'username' : USERNAME, 
     'password' : PASSWORD} 


    r = c.post(URL, data=login_data, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'}) 

    url = patt.search(r.text).group(1) 
    r = s.get(url).text 
    print(r) 

如果我们运行的代码和主要内容打印data-original-title属性,你可以看到我们是在dashborad页:

In [5]: with requests.Session() as s: 
    ...:   PASSWORD = 'xxxxxx' 
    ...:   USERNAME = "xxxxxxxxxx" 
    ...:   URL = 'https://www.onlydomains.com/account/login' 
    ...:   soup = BeautifulSoup(c.get(URL).text, "lxml") 
    ...:   csrf = soup.select_one("input[name=csrfToken]")["value"] 
    ...:   login_data = { 
    ...:   'csrfToken' : csrf, 
    ...:   'username' : USERNAME, 
    ...:   'password' : PASSWORD} 
    ...:   r = s.post(URL, data=login_data, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'}) 
    ...:   url = patt.search(r.text).group(1) 
    ...:   r = s.get(url).text 
    ...:   soup = BeautifulSoup(r,"lxml") 
    ...:   print(soup.select_one("h1.PageTitle.visible-xs i.fa.fa-info-circle")["data-original-title"]) 
    ...:  

Welcome to your Dashboard! Here you have a general overview of what's happening and how to manage your domain assets. 
+0

对不起,但我不熟悉'patt.search(r.text).group(1)' 我得到: 'url = patt.search(r.text).group(1) NameError:全局名称'patt'未定义' –

+0

对不起,我忘了添加正则表达式,我将在 –

+0

Thanx编辑!我在。:) –

-1

我认为,解决问题的最佳方式将与硒(我记得做一些像你想与BS做什么,但我不记得如何现在)

from selenium import webdriver 

chromedriver = 'C:\\chromedriver.exe' 
browser = webdriver.Chrome(chromedriver) 
browser.get('http://www.example.com') 

username = browser.find_element_by_name('username') 
username.send_keys('user1') 

password = browser.find_element_by_name('password') 
password.send_keys('secret') 

form = browser.find_element_by_id('loginForm') 
form.submit() 

这将使你能够加载应该包含您要:)的信息下一页

+0

硒工作,昨天我已经尝试成功。但我不想每次打开Firefox /铬。它将是服务器脚本。我在linux上工作。 ;) –