2017-10-28 46 views
0

我试图抓取以下网站的数据,该数据适用于一个page。但是,只要我点击复选框,该作业就不起作用。早些时候你可以看到,我只检测了24个元素,并没有点击复选框,这是正确的刮。作业只刮一个页面,并且在单击所有复选框时不起作用

只要我点击复选框,就会有更多的元素,它不能正常工作,如下所示。它为什么这样做?我相信硒一般应刮掉它是什么在这种情况下,但是看到它没有这样做......

enter image description here

driver = webdriver.Chrome() 
driver.set_window_size(1024, 600) 
driver.maximize_window() 
try: 
    os.remove('vtg121.csv') 
except OSError: 
    pass 

driver.get('https://www.palmerbet.com/sports/soccer') 

#SCROLL_PAUSE_TIME = 0.5 


from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 

#clickMe = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, ('//*[@id="TopPromotionBetNow"]')))) 
#if driver.find_element_by_css_selector('#TopPromotionBetNow'): 
    #driver.find_element_by_css_selector('#TopPromotionBetNow').click() 

#last_height = driver.execute_script("return document.body.scrollHeight") 

#while True: 

    #driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 


    #time.sleep(SCROLL_PAUSE_TIME) 


    #new_height = driver.execute_script("return document.body.scrollHeight") 
    #if new_height == last_height: 
     #break 
    #last_height = new_height 

time.sleep(1) 

clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//*[contains(@class,"filter_labe")]')))) 
clickMe.click() 
time.sleep(0) 
clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//*[contains(@class,"filter_labe")])'))) 
options = driver.find_elements_by_xpath('//*[contains(@class,"filter_labe")]') 

indexes = [index for index in range(len(options))] 
shuffle(indexes) 
for index in indexes: 
    time.sleep(0) 
    #driver.get('https://www.bet365.com.au/#/AS/B1/') 
    clickMe1 = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//ul[@id="tournaments"]//li//input)[%s]' % str(index + 1)))) 
    clickMe1 = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//ul[@id="tournaments"]//li//input)[%s]' % str(index + 1)))) 
    driver.find_element_by_tag_name('body').send_keys(Keys.UP) 
    driver.find_element_by_tag_name('body').send_keys(Keys.UP) 
    driver.find_element_by_tag_name('body').send_keys(Keys.UP) 
    driver.execute_script("return arguments[0].scrollIntoView();", clickMe1) 
    clickMe1.click() 
    time.sleep(0) 
    ##tournaments > li > input 
    #//*[@id='tournaments']//li//input 

    # Team 

    #clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,("#mta_row td:nth-child(1)")))) 
    time.sleep(5) 
    langs3 = driver.find_elements_by_xpath('//*[@id="mta_row"]/td[1]') 
    langs3_text = [] 

    for lang in langs3: 
     #print(lang.text) 

     langs3_text.append(lang.text) 
    time.sleep(0) 

    # Team ODDS 
    langs = driver.find_elements_by_css_selector("#mta_row .mpm_teams_cell_click:nth-child(2) .mpm_teams_bet_val") 
    langs_text = [] 

    for lang in langs: 
     #print(lang.text) 
     langs_text.append(lang.text) 
    time.sleep(0) 


    # HREF 
    #langs2 = driver.find_elements_by_xpath("//ul[@class='runners']//li[1]") 
    #a[href*="/sports/soccer/"] 
    url1 = driver.current_url 

    #clickMe = wait(driver, 15).until(
     #EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]')))) 
    try: 
     clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='match-pop-market']//a[contains(@href, '/sports/soccer/')]"))) 
     clickMe1.click() 
    except TimeoutException: 
     print("No link was found") 
    elems = driver.find_elements_by_css_selector('.match-pop-market a[href*="/sports/soccer/"]') 
    elem_href = [] 
    for elem in elems: 
    #print(elem.get_attribute("href")) 
    elem_href.append(elem.get_attribute("href")) 


    print(("NEW LINE BREAK")) 
    import sys 
    import io 


    with open('vtg121.csv', 'a', newline='', encoding="utf-8") as outfile: 
     writer = csv.writer(outfile) 
     for row in zip(langs3_text, langs_text, elem_href): 
      writer.writerow(row) 
      print(row) 
+0

您是否试过在点击之间添加睡眠? – Hunter

+0

@Hunter Yeap,对我来说没有区别。 – Tetora

+0

我有一个可怕的建议,你可以运行你的初始选择器,检查一个盒子,再次运行它,并寻找未经检查的盒子,然后检查,然后重复。 – Hunter

回答

1

您可以使用类似下面让球队的名字。添加更多代码

from selenium import webdriver 
import json 
import time 

driver = webdriver.Chrome() 

driver.get("https://www.palmerbet.com/sports/soccer") 

values = [] 
time.sleep(5) 
for elem in driver.find_elements_by_css_selector("li.sport-grp-filter.filter_item input"): 
    val = elem.get_attribute("value") 
    values.append(val) 

for val in values: 
    driver.get("https://www.palmerbet.com/sports/getroundmatches/socc/" + val) 

    json_data = driver.find_element_by_tag_name("pre").text 
    data = json.loads(json_data) 

    for item in data["m"]: 
     print (item["mta"]["cn"], item["mtb"]["cn"]) 
+0

@Shahin,在这种情况下,您应该首先发表评论并提出更改建议。在别人的回答中,你不应该把代码改变到这样的程度。我没有使用请求的原因是因为有时候,这些方面需要cookies,所以它更好地使用浏览器本身 –

+0

谢谢!这比标准方法有点混乱。有没有关于这种方法的指导? – Tetora

相关问题