的Python 2.7美丽的汤IMG SRC提取

for imgsrc in Soup.findAll('img', {'class': 'sizedProdImage'}): 
    if imgsrc: 
     imgsrc = imgsrc 
    else: 
     imgsrc = "ERROR" 

patImgSrc = re.compile('src="(.*)".*/>') 
findPatImgSrc = re.findall(patImgSrc, imgsrc) 

print findPatImgSrc 

''' 
<img height="72" name="proimg" id="image" class="sizedProdImage" src="http://imagelocation" />

这是我想从提取和我得到：的Python 2.7美丽的汤IMG SRC提取

findimgsrcPat = re.findall(imgsrcPat, imgsrc) 
File "C:\Python27\lib\re.py", line 177, in findall 
    return _compile(pattern, flags).findall(string) 
TypeError: expected string or buffer

''”

来源

2011-11-27 phales15

你传递beautifulsoup节点re.findall。你必须将其转换为字符串。尝试：

findPatImgSrc = re.findall(patImgSrc, str(imgsrc))

更重要的是，使用beautifulsoup提供的工具：

[x['src'] for x in soup.findAll('img', {'class': 'sizedProdImage'})]

为您提供一流的 'sizedProdImage' 的的img标签的所有SRC属性的列表。

来源

2011-11-27 23:48:58 soulcheck

你创建一个re对象，然后它传递到re.findall其中需要字符串作为第一个参数：

patImgSrc = re.compile('src="(.*)".*/>') 
findPatImgSrc = re.findall(patImgSrc, imgsrc)

相反，使用第E中的patImgSrc的.findall方法对象刚创建：

patImgSrc = re.compile('src="(.*)".*/>') 
findPatImgSrc = patImgSrc.findall(imgsrc)

来源

2011-11-27 23:46:00

仍然得到错误：回溯（最近通话最后一个）：文件 “C：\用户\ BuyzDirect \桌面\ OverStock_Listing_Format_Tool.py” 50行，在 findPatImgSrc = patImgSrc .findall（imgsrc） TypeError：预期的字符串或缓冲区 – phales15

有更简单的解决方案：

soup.find('img')['src']

来源

2013-07-09 21:29:37 StanleyD

在我的例子，将htmlText包含img标签，但它可以被用于URL过。见我的回答here

from BeautifulSoup import BeautifulSoup as BSHTML 
htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """ 
soup = BSHTML(htmlText) 
images = soup.findAll('img') 
for image in images: 
    print image['src']

来源

2017-11-07 20:18:23

的Python 2.7美丽的汤IMG SRC提取

回答

相关问题