2016-06-11 53 views
1

我有以下的代码,它为类“奇怪”或“偶数”divs网站刮伤。我想让“奇怪”和“偶数”变成我的函数接受的参数,这也允许我添加其他的div。这里是我的代码:BeautifulSoup findAll HTML类与多个变量类输入

# 
# Imports 
# 

import urllib2 
from bs4 import BeautifulSoup 
import re 
import os 
from pprint import pprint 

# 
# library 
# 

def get_soup(url): 
    page = urllib2.urlopen(url) 
    contents = page.read() 
    soup = BeautifulSoup(contents, "html.parser") 
    body = soup.findAll("tr", ["even", "odd"]) 
    string_list = str([i for i in body]) 
    return string_list 


def save_to_file(path, soup): 
    with open(path, 'w') as fhandle: 
     fhandle.write(soup) 


# 
# script 
# 

def main(): 
    url = r'URL GOES HERE' 
    path = os.path.join('PATH GOES HERE') 
    the_soup = get_soup(url) 
    save_to_file(path, the_soup) 



if __name__ == '__main__': 
    main() 

我想结合*args入代码,以便get_soup function是这样的:

def get_soup(url, *args): 
    page = urllib2.urlopen(url) 
    contents = page.read() 
    soup = BeautifulSoup(contents, "html.parser") 
    body = soup.findAll("tr", [args]) 
    string_list = str([i for i in body]) 
    return string_list 

def main(): 
    url = r'URL GOES HERE' 
    path = os.path.join('PATH GOES HERE') 
    the_soup = get_soup(url, "odd", "even") 
    save_to_file(path, the_soup) 

不幸的是,这是行不通的。想法?

+0

你有测试网站的网址吗? –

回答

0

不要把ARGS在列表中,ARGS已经是一个元组所以只是传递:

body = soup.findAll("tr", args) 

如果[args],你最终会像[("odd","even")]

而且str([i for i in body])是没有真正意义上的,这将是一样的只是做str(body),但我没有看到格式可以多么有用。

+0

这是完美的!至于str([我为我身体]) - 这是两个函数的组合,我还没有清理。我显然抄袭了错误的功能 - 尽管它和我的另一个做了同样的事情。谢谢@Padraic坎宁安! – Lefty

+0

不用担心,不客气。 –