如何从Python中的字符串中提取子字符串？

-2

所以我只是想知道我怎么会从以下字符串中提取http://www.google.com：如何从Python中的字符串中提取子字符串？

<div class="asdf"><a href="http://www.google.com">

比方说，我有一堆在里面链接一个巨大的字符串，我想提取所有内的链接一个href的引号，我该怎么做？

2015-11-07 Matt

您应该使用'regex'或'BeautifulSoup'做到这一点。 –

我认为他已经想要这个了，用'regex'标签来判断。 – TigerhawkT3

@ TigerhawkT3很好的通话，我没有看过标签。 –

from bs4 import BeautifulSoup 

soup = BeautifulSoup(data) 
for link in soup.select("div.asdf > a[href]"): 
    print(link["href"])

这将匹配所有具有href属性直接位于div元件具有“ASDF”类中的链接。

你也可以找到所有的输入文档中的a元素：

for link in soup.find_all("a", href=True): 
    print(link["href"])

或者：

for link in soup.select("a[href]"): 
    print(link["href"])

2015-11-07 03:23:48 alecxe

所有发现但是如果有多个div呢？这将是一个巨大的字符串，包含一个[href]。 – Matt

@Matt我更新了答案并添加了一些更一般的信息。虽然很高兴看到您的实际输入和期望的输出。 – alecxe

哎呀，谢谢！我目前正在使用Scrapy的xpaths。所以我认为它可能是response.xpath（“// div.asdf/a/@href”）。extract（）then ??对不起，我不确定你是否熟悉XPath。 – Matt

回答