正则表达式来匹配锚标记和它的href

-4

我想通过一个具有多个锚标记的html字符串运行正则表达式，并构建链接文本字典与其href url。正则表达式来匹配锚标记和它的href

<p>This is a simple text with some embedded <a href="http://example.com/link/to/some/page?param1=77&param2=22">links</a>. This is a <a href="https://exmp.le/sample-page/?uu=1">different link</a>.

如何提取一气呵成<a>标签的文字和HREF？

编辑：

func extractLinks(html: String) -> Dictionary<String, String>? { 

    do { 
     let regex = try NSRegularExpression(pattern: "/<([a-z]*)\b[^>]*>(.*?)</\1>/i", options: []) 
     let nsString = html as NSString 
     let results = regex.matchesInString(html, options: [], range: NSMakeRange(0, nsString.length)) 
     return results.map { nsString.substringWithRange($0.range)} 
    } catch let error as NSError { 
     print("invalid regex: \(error.localizedDescription)") 
     return nil 
    } 
}

来源

2017-05-05 Rao

你的正则表达式代码在哪里？ – matt

@matt：他们在等你写它。 –

它非常糟糕。 – Rao

首先，你需要学习NSRegularExpression的pattern的基本语法：

pattern不包含分隔符
pattern不含改性剂，你需要通过如下信息options
当你wa nt使用元字符\，则需要在Swift字符串中将其转义为\\。

因此，创造NSRegularExpression实例的行应该是这样的：

let regex = try NSRegularExpression(pattern: "<([a-z]*)\\b[^>]*>(.*?)</\\1>", options: .caseInsensitive)

但是，正如你可能已经知道，你的模式不包含任何代码以匹配href或捕获它的价值。

像这样的你的榜样html工作：

let pattern = "<a\\b[^>]*\\bhref\\s*=\\s*(\"[^\"]*\"|'[^']*')[^>]*>((?:(?!</a).)*)</a\\s*>" 
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive) 
let html = "<p>This is a simple text with some embedded <a\n" + 
    "href=\"http://example.com/link/to/some/page?param1=77&param2=22\">links</a>.\n" + 
    "This is a <a href=\"https://exmp.le/sample-page/?uu=1\">different link</a>." 
let matches = regex.matches(in: html, options: [], range: NSRange(0..<html.utf16.count)) 
var resultDict: [String: String] = [:] 
for match in matches { 
    let hrefRange = NSRange(location: match.rangeAt(1).location+1, length: match.rangeAt(1).length-2) 
    let innerTextRange = match.rangeAt(2) 
    let href = (html as NSString).substring(with: hrefRange) 
    let innerText = (html as NSString).substring(with: innerTextRange) 
    resultDict[innerText] = href 
} 
print(resultDict) 
//->["different link": "https://exmp.le/sample-page/?uu=1", "links": "http://example.com/link/to/some/page?param1=77&param2=22"]

记住，我的pattern上面可能错误地检测到病态的一个标签或错过一些嵌套结构，也缺乏特色与HTML字符的工作实体...

如果你想让你的代码更健壮和通用，你最好考虑采用ColGraff和Rob建议的HTML解析器。

来源

2017-05-06 01:32:28 OOPer

正则表达式来匹配锚标记和它的href

回答

相关问题