2016-12-18 21 views
0

我想使用由deco模块提供的并发功能。该代码工作没有多线程如在这里的答案....多进程子功能不返回任何结果

Extract specific columns from a given webpage

但是,下面的代码不会对finallist返回任何元素。 (它是空的)。从print语句中可以看出,它在“slow”的函数范围内返回了一些结果。但为什么外表是空的?

import urllib.request 
from bs4 import BeautifulSoup 
from deco import concurrent, synchronized 

finallist=list() 
urllist=list() 

@concurrent 
def slow(url): 
    #print (url) 
    try: 
     page = urllib.request.urlopen(url).read() 
     soup = BeautifulSoup(page) 
     mylist=list() 
     for anchor in soup.find_all('div', {'class':'col-xs-8'})[:9]: 
      mylist.append(anchor.text) 
      urllist.append(url) 
     finallist.append(mylist) 
     #print (mylist) 
     print (finallist) 
    except: 
     pass 


@synchronized 
def run(): 
    finallist=list() 
    urllist=list() 
    for i in range(10): 
     url='https://pythonexpress.in/workshop/'+str(i).zfill(3) 
     print (url) 
     slow(url) 
    slow.wait() 

回答

1

我重构了您的代码以使用模块。我固定的common pitfalls outlined on the deco wiki两个:

  1. 不要使用全局变量
  2. 尽一切用方括号操作:OBJ [关键] =值

这里的结果:

import urllib 
from bs4 import BeautifulSoup 
from deco import concurrent, synchronized 

N = 10 

@concurrent 
def slow(url): 
    try: 
     page = urllib.urlopen(url).read() 
     soup = BeautifulSoup(page, "html.parser") 
     mylist=list() 
     for anchor in soup.find_all('div', {'class':'col-xs-8'})[:9]: 
      mylist.append(anchor.text) 
     return mylist 
    except: 
     pass 

@synchronized 
def run(): 
    finallist=[None] * N 
    urllist = ['https://pythonexpress.in/workshop/'+str(i).zfill(3) for i in range(N)] 
    for i, url in enumerate(urllist): 
     print (url) 
     finallist[i] = slow(url) 
    return finallist 

if __name__ == "__main__": 
    finallist = run() 
    print(finallist)