如何解析图像目录中的每个html文件？

我有一个充满html文件的目录，每个文件都有一个牛皮癣患者的临床图像。我想打开每个文件，找到图像，并将其保存在同一个目录中。如何解析图像目录中的每个html文件？

import os, os.path 
import Image 
from BeautifulSoup import BeautifulSoup as bs 

path = 'C:\Users\gokalraina\Desktop\derm images' 

for root, dirs, files in path: 
    for f in files: 
     soup = bs(f) 
     for image in soup.findAll("img"): 
      print "Image: %(src)s" % image 
      im = Image.open(image) 
      im.save(path+image["src"], "JPEG")

我得到这个错误：

Traceback (most recent call last): 
    File "C:\Users\gokalraina\Desktop\modfile.py", line 7, in <module> 
    for root, dirs, files in path: 
ValueError: need more than 1 value to unpack

即使谷歌搜索的错误之后，我不知道什么是错的，或者如果我正确地做这个。请记住，我是python的新手。

编辑：使该程序的修改建议后，我仍然得到一个错误：

Traceback (most recent call last): 
    File "C:\Users\gokalraina\Desktop\modfile.py", line 25, in <module> 
    im = Image.open(image) 
    File "C:\Python27\lib\site-packages\PIL\Image.py", line 1956, in open 
    prefix = fp.read(16) 
TypeError: 'NoneType' object is not callable

这是修改后的代码（感谢nightcracker）

import os, os.path 
import Image 
from BeautifulSoup import BeautifulSoup as bs 

path = 'C:\Users\gokalraina\Desktop\derm images' 

for root, dirs, files in os.walk(path): 
    for f in files: 
     soup = bs(open(os.path.join(root, f)).read()) 
     for image in soup.findAll("img"): 
      print "Image: %(src)s" % image 
      im = Image.open(image) 
      im.save(path+image["src"], "JPEG")

来源

2012-03-07 Wandering Sophist

是'modfile.py'源你贴？第7行似乎是空白行，所以我猜不是。你需要在你的文章中加入'modfile.py'。 – 2012-03-07 20:42:53

是的，modfile.py是发布的代码。 – 2012-03-07 20:56:30

你需要改变这条线：

for root, dirs, files in path:

到

for root, dirs, files in os.walk(path):

还要注意的是files是文件名，而不是对象，因此这将成为您的固定代码：

import os, os.path 
import Image 
from BeautifulSoup import BeautifulSoup as bs 

path = 'C:\Users\gokalraina\Desktop\derm images' 

for root, dirs, files in os.walk(path): 
    for f in files: 
     soup = bs(open(os.path.join(root, f)).read()) 
     for image in soup.findAll("img"): 
      print "Image: %(src)s" % image 
      im = Image.open(image) 
      im.save(path+image["src"], "JPEG")

来源

2012-03-07 20:41:33 orlp

感谢您的快速回复。我改变了程序中的那一行代码，但仍然无法按预期工作 - 控制台上没有输出，也没有目录中的任何新文件。还有其他事情做错了吗？ – 2012-03-07 20:45:54

你需要使用os.walk(path):提供一个字符串，以提供一些有意义的事情的清单是一个单一的事情，它期待着一系列事情。

走了文件系统的惯用方法是使用os.walk()

for root, dirs, files in os.walk(path):

来源

2012-03-07 20:42:06

for root, dirs, files in path:

path这里是一个字符串。每个元素只有一个字符，并且不能将单个字符拆分为三个变量。因此，错误消息：您需要多个值来解包。

你可能想：

for root, dirs, files in os.walk(path):

来源

2012-03-07 20:42:39 kindall

如何解析图像目录中的每个html文件？

回答

相关问题