我真的很困惑PHP的正则表达式。PHP：正则表达式搜索一个文件中的模式，并捡起它

无论如何，我现在无法阅读整个教程的事情，因为我有一堆html中的文件，我必须在那里尽快找到链接。我想出了一个用php代码实现自动化的想法，它是我知道的语言。

，所以我想我可以用户此脚本：

$address = "file.txt"; 
$input = @file_get_contents($address) or die("Could not access file: $address"); 
$regexp = "??????????"; 
if(preg_match_all("/$regexp/siU", $input, $matches)) { 
    // $matches[2] = array of link addresses 
    // $matches[3] = array of link text - including HTML code 
}

我的问题是$regexp

我需要的模式是这样的：

href="/content/r807215r37l86637/fulltext.pdf" title="Download PDF

我要搜索并获得/content/r807215r37l86637/fulltext.pdf从我上面有许多文件中。

有帮助吗？

==================

编辑

标题属性是对我和所有的人，我想重要的是，在题为

title =“Download PDF”

来源

2011-02-11 Alireza

再次正则表达式是bad for parsing html。

保存您的理智并使用内置的DOM库。

$dom = new DOMDocument(); 
@$dom->loadHTML($html); 
$x = new DOMXPath($dom); 
    $data = array(); 
foreach($x->query("//a[@title='Download PDF']") as $node) 
{ 
    $data[] = $node->getAttribute("href"); 
}

编辑基于ircmaxell评论更新后的代码。

来源

2011-02-11 20:25:11

呃。为什么xpath如果你只做一个nodename搜索？为什么不只是`$ dom-> getElementsByTagName（'a'）;`？我可以理解xpath，如果你做了$ x-> query（'// a [contains（@title，“Download Pdf”）]'）;`这将返回完全匹配... ;-) – ircmaxell 2011-02-11 20:31:40

@ircmaxell，你完全正确.`getElementsByTagName（）`可能是一种更有效的方法。 – 2011-02-11 20:35:26

@safaali在查询中，更改`@ title ='将Pdf'`下载到`@ class ='nameOfClass'`或使用`包含（@title，'下载PDF'）`。即使他们有额外的东西，包含会抓住他们。 – 2011-02-11 20:46:30

href="([^]+)"将会为您提供该表格的所有链接。

来源

2011-02-11 20:22:10 Blindy

谢谢你，但也有在文件中许多herfs，我想那链接标题为“下载PDF” – Alireza 2011-02-11 20:24:28

尝试这样的事情。如果它不起作用，请显示您想要解析的链接的一些示例。

<?php 
$address = "file.txt"; 
$input = @file_get_contents($address) or die("Could not access file: $address"); 
$regexp = '#<a[^>]*href="([^"]*)"[^>]*title="Download PDF"#'; 

if(preg_match_all($regexp, $input, $matches, PREG_SET_ORDER)) { 
    foreach ($matches as $match) { 
    printf("Url: %s<br/>", $match[1]); 
    } 
}

编辑：更新，因此它会搜索下载 “PDF项” 仅

来源

2011-02-11 20:25:43

这与phpQuery或QueryPath简单：

foreach (qp($html)->find("a") as $a) { 
    if ($a->attr("title") == "PDF") { 
     print $a->attr("href"); 
     print $a->innerHTML(); 
    } 
}

除了正规这取决于源的一些一致性：

preg_match_all('#<a[^>]+href="([^>"]+)"[^>]+title="Download PDF"[^>]*>(.*?)</a>#sim', $input, $m);

寻找固定的title="..." attrib ute是可行的，但由于它取决于右括号之前的位置，因此更加困难。

来源

2011-02-11 20:26:37 mario

最好的办法是使用DomXPath做搜索一步到位：

$dom = new DomDocument(); 
$dom->loadHTML($html); 
$xpath = new DomXPath($dom); 

$links = array(); 
foreach($xpath->query('//a[contains(@title, "Download PDF")]') as $node) { 
    $links[] = $node->getAttribute("href"); 
}

甚至：

$links = array(); 
$query = '//a[contains(@title, "Download PDF")]/@href'; 
foreach($xpath->evaluate($query) as $attr) { 
    $links[] = $attr->value; 
}

来源

2011-02-11 20:37:06 ircmaxell

PHP：正则表达式搜索一个文件中的模式，并捡起它

编辑

回答

相关问题