0

我想创建一个基于canvas标签的PIL图像对象,该标签用this网站的Selenium提取。目标是使用pytesseract并获取验证码内容。我的代码不会产生任何错误,但创建的图像全是黑色的。从Selenium的画布标签创建PIL图像对象

我迄今为止代码:

# Run JS code to get data URI 
png_url = driver.execute_script(
     'return document.getElementsByTagName("canvas")[0].toDataURL("image/png");') 
# Parse the URI to get only the base64 part 
str_base64 = re.search(r'base64,(.*)', png_url).group(1) 
# Convert it to binary 
str_decoded = str_base64.decode('base64') 
# Create and show Image object 
image = Image.open(StringIO(str_decoded)) 
image.show()  
# Read image with pytesseract 
recaptcha = pytesseract.image_to_string(image) 

我不知道为什么图像是全黑的。我的代码基于this教程,它保存了图像。我不想保存图像,我希望它只在内存中。

编辑:

我已经在文件系统中保存的图像和图像保存好,但与透明的背景下,表现出这样的时候出现黑色。我怎样才能使背景变白?

回答

0

所有我需要做的是提取的背景下this答案:

def remove_transparency(im, bg_colour=(255, 255, 255)): 

    # Only process if image has transparency (https://stackoverflow.com/a/1963146) 
    if im.mode in ('RGBA', 'LA') or (im.mode == 'P' and 'transparency' in im.info): 

     # Need to convert to RGBA if LA format due to a bug in PIL (https://stackoverflow.com/a/1963146) 
     alpha = im.convert('RGBA').split()[-1] 

     # Create a new background image of our matt color. 
     # Must be RGBA because paste requires both images have the same format 
     # (https://stackoverflow.com/a/8720632 and https://stackoverflow.com/a/9459208) 
     bg = Image.new("RGBA", im.size, bg_colour + (255,)) 
     bg.paste(im, mask=alpha) 
     return bg 

    else: 
     return im 

完整的代码,然后:

png_url = driver.execute_script(
      'return document.getElementsByTagName("canvas")[0].toDataURL("image/png");') 
str_base64 = re.search(r'base64,(.*)', png_url).group(1) 
# Convert it to binary 
str_decoded = str_base64.decode('base64') 
image = Image.open(StringIO(str_decoded)) 
image = remove_transparency(image) 
recaptcha = pytesseract.image_to_string(image).replace(" ", "") 
0

您应该创建一个RGB白色图像和您的RGBA图像粘贴到它。解决方案可能是this,但也有其他方法。我建议numpy和opencv。