2016-08-11 226 views
12

这可能是一个长镜头,但在这里,我遇到了错误:使用Flask应用程序处理请求线程错误?

File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 596, in process_request_thread 
    self.finish_request(request, client_address) 
    File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 331, in finish_request 
    self.RequestHandlerClass(request, client_address, self) 
    File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 654, in __init__ 
    self.finish() 
    File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 713, in finish 
    self.wfile.close() 
    File "/home/MY NAME/anaconda/lib/python2.7/socket.py", line 283, in close 
    self.flush() 
    File "/home/MY NAME/anaconda/lib/python2.7/socket.py", line 307, in flush 
    self._sock.sendall(view[write_offset:write_offset+buffer_size]) 
error: [Errno 32] Broken pipe 

我建立了一个Flask应用程序,采用地址作为输入,并执行一些字符串格式化,操作等,然后将它们发送到Bing Maps进行地理编码(通过geopy外部模块)。

我正在使用此应用程序来清理非常大的数据集。该应用程序适用于通常约1,500个地址(每行输入1个)的输入。我的意思是它将处理地址并将其发送到Bing Maps进行地理编码然后返回。大约1,500个地址后,应用程序变得无响应。如果在我工作时发生这种情况,我的代理告诉我,有一个tcp error。如果我在非工作电脑上,它只是不加载页面。如果我重新启动应用程序,那么它的功能非常好。正因为如此,我不得不使用大约1,000个地址的批量来运行我的程序(为了安全起见,我还不确定程序崩溃的确切数量)。

有没有人有任何想法可能会导致它?

我一直在思考着我今天的Bing API密钥限制(这是30,000),但这不能准确,因为我很少每天使用超过15,000个请求。

我的第二个想法是,也许这是因为我仍然使用标准烧瓶服务器来运行我的应用程序。切换到gunicornuWSGI解决这个问题?

我的第三个想法可能是它正在超负荷的请求数量。在第一个1000个地址之后,我试图让程序休眠15秒左右,但这并没有解决任何问题。

如果有人需要进一步澄清,请让我知道。

这是我对Flask应用程序后端的代码。我是从这个函数获取输入:

@app.route("/clean", methods=['POST']) 
def dothing(): 
    addresses = request.form['addresses'] 
    return cleanAddress(addresses) 

这里是cleanAddress功能:这是一个有点混乱,现在,所有的if语句来检查在处理具体的错别字,但我打算在移动很多这样的代码放到另一个文件中的其他函数中,只是通过这些函数传递地址来清理它。

def cleanAddress(addresses): 

    counter = 0 

    # nested helper function to fix addresses such as '30 w 60th' 
    def check_st(address): 
     if 'broadway' in address: 
      return address 
     has_th_st_nd_rd = re.compile(r'(?P<number>[\d]{1,4}(th|st|nd|rd)\s)(?P<following>.*)') 
     has_number = has_th_st_nd_rd.search(address) 
     if has_number is not None: 
      if re.match(r'(street|st|floor)', has_number.group('following')): 
       return address 
      else: 
       new_address = re.sub('(?P<number>[\d]{1,4}(st|nd|rd|th)\s)', r'\g<number>street ', address, 1) 
       return new_address 
     else: 
      return address 

    addresses = addresses.split('\n') 
    cleaned = [] 
    success = 0 
    fail = 0 
    cleaned.append('<body bgcolor="#FACC2E"><center><img src="http://goglobal.dhl-usa.com/common/img/dhl-express-logo.png" alt="Smiley face" height="100" width="350"><br><p>') 

    cleaned.append('<br><h3>Note: Everything before the first comma is the Old Address. Everything after the first comma is the New Address</h13>') 
    cleaned.append('<p><h3>To format the output in Excel, split the columns using "," as the delimiter. </p></h3>') 
    cleaned.append('<p><h2><font color="red">Old Address </font> <font color="black">New Address </font></p></h2>') 

    for address in addresses: 
     dirty = address.strip() 
     if ',' in address: 
      dirty = dirty.replace(',', '') 
     cleaned.append('<font color="red">' + dirty + ', ' + '</font>') 

     address = address.lower() 
     address = re.sub('[^A-Za-z0-9#]+', ' ', address).lstrip() 

     pattern = r"\d+.* +(\d+ .*(" + "|".join(patterns) + "))" 
     address = re.sub(pattern, "\\1", address) 

     address = check_st(address) 


     if 'one ' in address: 
      address = address.replace('one', '1') 
     if 'two' in address: 
      address = address.replace('two', '2') 
     if 'three' in address: 
      address = address.replace('three', '3') 
     if 'four' in address: 
      address = address.replace('four', '4') 
     if 'five' in address: 
      address = address.replace('five', '5') 
     if 'eight' in address: 
      address = address.replace('eight', '8') 
     if 'nine' in address: 
      address = address.replace('nine', '9') 
     if 'fith' in address: 
      address = address.replace('fith', 'fifth') 
     if 'aveneu' in address: 
      address = address.replace('aveneu', 'avenue') 
     if 'united states of america' in address: 
      address = address.replace('united states of america', '') 
     if 'ave americas' in address: 
      address = address.replace('ave americas', 'avenue of the americas') 
     if 'americas avenue' in address: 
      address = address.replace('americas avenue', 'avenue of the americas') 
     if 'avenue of americas' in address: 
      address = address.replace('avenue of americas', 'avenue of the americas') 
     if 'avenue of america ' in address: 
      address = address.replace('avenue of america ', 'avenue of the americas ') 
     if 'ave of the americ' in address: 
      address = address.replace('ave of the americ', 'avenue of the americas') 
     if 'avenue america' in address: 
      address = address.replace('avenue america', 'avenue of the americas') 
     if 'americaz' in address: 
      address = address.replace('americaz', 'americas') 
     if 'ave of america' in address: 
      address = address.replace('ave of america', 'avenue of the americas') 
     if 'amrica' in address: 
      address = address.replace('amrica', 'americas') 
     if 'americans' in address: 
      address = address.replace('americans', 'americas') 
     if 'walk street' in address: 
      address = address.replace('walk street', 'wall street') 
     if 'northend' in address: 
      address = address.replace('northend', 'north end') 
     if 'inth' in address: 
      address = address.replace('inth', 'ninth') 
     if 'aprk' in address: 
      address = address.replace('aprk', 'park') 
     if 'eleven' in address: 
      address = address.replace('eleven', '11') 
     if ' av ' in address: 
      address = address.replace(' av ', ' avenue') 
     if 'avnue' in address: 
      address = address.replace('avnue', 'avenue') 
     if 'ofthe americas' in address: 
      address = address.replace('ofthe americas', 'of the americas') 
     if 'aj the' in address: 
      address = address.replace('aj the', 'of the') 
     if 'fifht' in address: 
      address = address.replace('fifht', 'fifth') 
     if 'w46' in address: 
      address = address.replace('w46', 'w 46') 
     if 'w42' in address: 
      address = address.replace('w42', 'w 42') 
     if '95st' in address: 
      address = address.replace('95st', '95th st') 
     if 'e61 st' in address: 
      address = address.replace('e61 st', 'e 61st') 
     if 'driver information' in address: 
      address = address.replace('driver information', '') 
     if 'e87' in address: 
      address = address.replace('e87', 'e 87') 
     if 'thrd avenus' in address: 
      address = address.replace('thrd avenus', 'third avenue') 
     if '3r ' in address: 
      address = address.replace('3r ', '3rd ') 
     if 'st ates' in address: 
      address = address.replace('st ates', '') 
     if 'east52nd' in address: 
      address = address.replace('east52nd', 'east 52nd') 
     if 'authority to leave' in address: 
      address = address.replace('authority to leave', '') 
     if 'sreet' in address: 
      address = address.replace('sreet', 'street') 
     if 'w47' in address: 
      address = address.replace('w47', 'w 47') 
     if 'signature required' in address: 
      address = address.replace('signature required', '') 
     if 'direct' in address: 
      address = address.replace('direct', '') 
     if 'streetapr' in address: 
      address = address.replace('streetapr', 'street') 
     if 'steet' in address: 
      address = address.replace('steet', 'street') 
     if 'w39' in address: 
      address = address.replace('w39', 'w 39') 
     if 'ave of new york' in address: 
      address = address.replace('ave of new york', 'avenue of the americas') 
     if 'avenue of new york' in address: 
      address = address.replace('avenue of new york', 'avenue of the americas') 
     if 'brodway' in address: 
      address = address.replace('brodway', 'broadway') 
     if 'w 31 ' in address: 
      address = address.replace('w 31 ', 'w 31th ') 
     if 'w 34 ' in address: 
      address = address.replace('w 34 ', 'w 34th ') 
     if 'w38' in address: 
      address = address.replace('w38', 'w 38') 
     if 'broadeay' in address: 
      address = address.replace('broadeay', 'broadway') 
     if 'w37' in address: 
      address = address.replace('w37', 'w 37') 
     if '35street' in address: 
      address = address.replace('35street', '35th street') 
     if 'eighth avenue' in address: 
      address = address.replace('eighth avenue', '8th avenue') 
     if 'west 33' in address: 
      address = address.replace('west 33', 'west 33rd') 
     if '34t ' in address: 
      address = address.replace('34t ', '34th ') 
     if 'street ave' in address: 
      address = address.replace('street ave', 'ave') 
     if 'avenue of york' in address: 
      address = address.replace('avenue of york', 'avenue of the americas') 
     if 'avenue aj new york' in address: 
      address = address.replace('avenue aj new york', 'avenue of the americas') 
     if 'avenue ofthe new york' in address: 
      address = address.replace('avenue ofthe new york', 'avenue of the americas') 
     if 'e4' in address: 
      address = address.replace('e4', 'e 4') 
     if 'avenue of nueva york' in address: 
      address = address.replace('avenue of nueva york', 'avenue of the americas') 
     if 'avenue of new york' in address: 
      address = address.replace('avenue of new york', 'avenue of the americas') 
     if 'west end new york' in address: 
      address = address.replace('west end new york', 'west end avenue') 

     #print address  
     address = address.split(' ') 
     for pattern in patterns: 
      try: 
       if address[0].isdigit(): 
        continue 
       else: 
        location = address.index(pattern) + 1 
        number_location = address[location] 
        #print address[location] 
        #if 'th' in address[location + 1] or 'floor' in address[location + 1] or '#' in address[location]: 
        # continue 
      except (ValueError, IndexError): 
       continue 
      if number_location.isdigit() and len(number_location) <= 4: 
       address = [number_location] + address[:location] + address[location+1:] 
       break 
     address = ' '.join(address) 

     if '#' in address: 
      address = address.replace('#', '') 


     #print (address) 


     i = 0 
     for char in address: 
      if char.isdigit(): 
       address = address[i:] 
       break 
      i += 1 


     #print (address) 

     if 'plz' in address: 
      address = address.replace('plz', 'plaza ', 1) 
     if 'hstreet' in address: 
      address = address.replace('hstreet', 'h street') 
     if 'dstreet' in address: 
      address = address.replace('dstreet', 'd street') 
     if 'hst' in address: 
      address = address.replace('hst', 'h st') 
     if 'dst' in address: 
      address = address.replace('dst', 'd st') 
     if 'have' in address: 
      address = address.replace('have', 'h ave') 
     if 'dave' in address: 
      address = address.replace('dave', 'd ave') 
     if 'havenue' in address: 
      address = address.replace('havenue', 'h avenue') 
     if 'davenue' in address: 
      address = address.replace('davenue', 'd avenue') 



     #print address 

     regex = r'(.*)(' + '|'.join(patterns) + r')(.*)' 
     address = re.sub(regex, r'\1\2', address).lstrip() + " nyc" 

     print (address) 

     if 'americasas st' in address: 
      address = address.replace('americasas st', 'americas') 

     try: 

      clean = geolocator.geocode(address) 
      x = clean.address 
      address, city, zipcode, country = x.split(",") 
      address = address.lower() 
      if 'first' in address: 
       address = address.replace('first', '1st') 
      if 'second' in address: 
       address = address.replace('second', '2nd') 
      if 'third' in address: 
       address = address.replace('third', '3rd') 
      if 'fourth' in address: 
       address = address.replace('fourth', '4th') 
      if 'fifth' in address: 
       address = address.replace('fifth', '5th') 
      if ' sixth a' in address: 
       address = address.replace('ave', '') 
       address = address.replace('avenue', '') 
       address = address.replace(' sixth', ' avenue of the americas') 
      if ' 6th a' in address: 
       address = address.replace('ave', '') 
       address = address.replace('avenue', '') 
       address = address.replace(' 6th', ' avenue of the americas') 
      if 'seventh' in address: 
       address = address.replace('seventh', '7th') 
      if 'fashion' in address: 
       address = address.replace('fashion', '7th') 
      if 'eighth' in address: 
       address = address.replace('eighth', '8th') 
      if 'ninth' in address: 
       address = address.replace('ninth', '9th') 
      if 'tenth' in address: 
       address = address.replace('tenth', '10th') 
      if 'eleventh' in address: 
       address = address.replace('eleventh', '11th') 


      zipcode = zipcode[3:] 
      to_write = str(address) + ", " + str(zipcode.lstrip()) + ", " + str(clean.latitude) + ", " + str(clean.longitude) 
      to_find = str(address) 

      #print to_write 

      # returns 'can not be cleaned' if street address has no numbers 
      if any(i.isdigit() for i in str(address)): 
       with open('/home/MY NAME/Address_Database.txt', 'a+') as database: 
        if to_find not in database.read(): 
         database.write(dirty + '|' + to_write + '\n') 
       if 'ncy rd' in address: 
        cleaned.append('<font color="red"> Can not be cleaned </font> <br>') 
        fail += 1 
       elif 'nye rd' in address: 
        cleaned.append('<font color="red"> Can not be cleaned </font> <br>') 
        fail += 1 
       elif 'nye c' in address: 
        cleaned.append('<font color="red"> Can not be cleaned </font> <br>') 
        fail += 1      
       else: 
        cleaned.append(to_write + '<br>') 
        success += 1 
      else: 
       cleaned.append('<font color="red"> Can not be cleaned </font> <br>') 
       fail += 1 
     except AttributeError: 
      cleaned.append('<font color="red"> Can not be cleaned </font> <br>') 
      fail += 1 
     except ValueError: 
      cleaned.append('<font color="red"> Can not be cleaned </font> <br>') 
      fail += 1 
     except GeocoderTimedOut as e: 
      cleaned.append('<font color="red"> Can not be cleaned </font> <br>') 
      fail += 1 

    total = success + fail 
    percent = float(success)/float(total) * 100 
    percent = round(percent, 2) 
    print percent 
    cleaned.append('<br>Accuracy: ' + str(percent) + ' %') 
    cleaned.append('</p></center></body>') 

    return "\n".join(cleaned) 

更新:我已经转向使用gunicorn运行的应用程序,这是解决当我访问来自我的家庭网络应用,但是,我仍然收到TCP错误的问题我工作代理。我的控制台没有收到任何错误消息,浏览器只显示TCP错误。我可以告诉该工具仍在后台工作,因为我在循环中有一个打印语句,告诉我每个地址仍在进行地理编码。这可能是我的工作网络不喜欢页面长时间保持加载,然后只显示代理错误页面?

+0

你可以把一些代码,你有那么我们可以看看? – Seekheart

+0

@ Seekheart当然可以。我将添加我的清洗功能的代码。 – Harrison

+0

我有类似的问题,并有一个适当的Web服务器解决了这个问题。我用nginx – slysid

回答

3

声音就像文件句柄已用完(对于普通用户,默认限制为1024),您可以通过为grep 'open' /proc/<webapp pid>限制以及ls -1 /proc/<pid>/fd | wc -l来检查当前打开的文件句柄。

我认为你的代码没有发送正确的响应,导致连接保持打开,最终用完了打开的文件句柄(一个打开的套接字是posix系统上的一个文件)。

当您看到问题时,可以确认连接与netstat -an | grep <webapp port>处于什么状态。它应该有一个1k + IP和端口及其状态的列表。

会猜测它们处于TIME_WAIT状态,这表明客户端没有正确关闭连接,它留给内核以稍后进行垃圾收集。

尝试:

from flask import make_response 

@app.route("/clean", methods=['POST']) 
def dothing(): 
    addresses = request.form['addresses'] 
    resp = make_response(cleanAddress(addresses), 200) 
    return resp 
+0

如果遇到这种情况,我会期望套接字创建失败,而不是在已经打开的套接字上发送数据。更有可能的是,另一端已经放弃了连接。 –

+0

如果连接在服务器端没有正确关闭,会发生这种情况。 – danny

+0

这并未解决问题。我认为这个问题是由于我的工作代理自动将超时的页面超时。尽管我会给予你赏金的帮助...... – Harrison

1

我有类似的问题,并有一个适当的Web服务器解决了这个问题。我在nginx中使用了UWSGI

+0

感谢您通过解决方案走过我! – Harrison

+0

这是在我的家庭网络上工作,但是我仍然在我的工作网络上收到相同的TCP错误。 – Harrison

相关问题