2014-09-04 47 views
1

我想从文本文件中使用参数来刮取链接并将结果写入csv文件。但是当我尝试使用多线程实现它时,出现以下错误::如何在Python中使用多线程抓取过程读取和写入文件?获取Windows错误32

WindowsError: [Error 32] The process cannot access the file because it is being used by another process:  
'c:\\users\\appdata\\local\\temp\\tmpqseulj.webdriver.xpi\\components\\wdIStatus.xpt' 

请帮助解决问题。 内联是代码

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.support.ui import Select 
from selenium.common.exceptions import NoSuchElementException 
import unittest, time, re 
from threading import Thread 
import urlparse 
import urllib2 
import sys; 
import csv 
import operator 
reload(sys); 
sys.setdefaultencoding("utf8") 


with open("C:\\Test2.csv", "w") as f: 
    fieldnames = ("SearchQuery", "Title") 
    output = csv.writer(f, delimiter=",") 
    output.writerow(fieldnames) 


def th(ur):  
    driver = webdriver.Firefox() 
    driver.get("https://www.google.com/search?q="+ur) 
    time.sleep(20); 

    html_source = driver.page_source 

    regex = '<span class="label">(.*?)</span>' 
    pattern = re.compile(regex) 

    Cluster = re.findall(pattern, html_source) 
    Cluster = [H.replace("All Topics","") for H in Cluster] 
    Cluster = [H.replace("Other topics","") for H in Cluster] 
    Cluster = filter(operator.methodcaller('strip'), Cluster) 

    print ur, str(Cluster) 

    output.writerow([ur, HotelName]) 
    driver.close(); 


Symbolfile = open("Result.txt") 
Symbollist = Symbolfile.read() 
new = Symbollist.split("\n") 


threadlist = [] 

for u in new:        # thread implementation 
    t = Thread(target=th, args=(u,)) 
    t.start() 
    threadlist.append(t) 

for b in threadlist: 
    b.join() 
+0

我删除了'driver.get(“https://www.google.com/search?q=”+ ur“)的引用' – Nilesh 2014-09-04 03:54:15

回答

0

您需要使用锁,如果多个线程将被写入同一个文件。

这看起来像一个合理的例子:http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/

“打印”也是不是线程安全的。

+0

作为一种事后考虑,您可以作弊并使用标准日志记录模块来编写您的输出?(只需将其设置为可以重新命名并将其用作CSV)日志记录模块是线程安全的,因此将为您处理所有锁定... – ColinMcC 2014-09-04 07:52:20

+0

不确定如何添加这将是最佳的解决方案 – Tarun 2014-09-04 11:34:17

+0

这很简单。标准的Python模块。https://docs.python.org/2/howto/logging.html – ColinMcC 2014-09-04 12:42:43