2011-04-23 85 views
0

可能重复:
Grabbing the href attribute of an A element
Best methods to parse HTML正则表达式在PHP从HTML代码中提取图像URL

我一直用这个代码来提取HTML代码的图像在PHP中:

$output = preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $content, $matches); 
if ($output > 0) echo $matches[1][0]; 

它一直在工作国王对我来说一直都很好,但是它对特定的HTML代码行为不端。我没有很好的正则表达式,所以需要帮助解决这个问题。

作品为:

<p> 
    I finally decided to try Pomodoro technique to see how well it can improve my productivity as I am a lot disorganised, lazy sorta geek (well who isn’t?). So I built up a small script which acts as a Pomodoro timer for me using <a href="http://blog.ashfame.com/2011/04/ubuntu-notification-system/">Ubuntu notification system</a> (Do read it if you haven’t, you need to install lib-notify package for this script to work). 
</p> 
<p> 
    I have created a launcher in my top panel, with which I start a new <em>pomodori</em> (name for a new period of time, lets call it a Pomodoro anyway). It calls up the script which alerts me that a new Pomodoro (time period) has started and then alert me again when the timer ends and I should take a small break. 
</p> 
<p> 
    Here is the script: 
</p> 
<pre class="brush: bash; title: ; toolbar: false;" title=""> 
DISPLAY=:0 notify-send -t 1000 -i /home/ashfame/Dropbox/Ubuntu/icons/pomodoro.png "New Pomodoro starts" "You have 25 minutes to work."# 25 minutes timersleep 1500DISPLAY=:0 notify-send -t 1000 -i /home/ashfame/Dropbox/Ubuntu/icons/pomodoro.png "Pomodoro ends" "Take a break!" 
</pre> 
<p> 
    As soon as I click the launcher, the first notification appears telling me that a new Pomodoro has started. 
</p> 
<p> 
    <img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro-starts.png" alt="pomodoro starts"> 
</p> 
<p> 
    Then it sleeps for 1500 secs = 25 minutes. And after that the second notification appears telling me that the Pomodoro has ended. 
</p> 
<p> 
    <img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro-ends.png" alt="pomodoro ends"> 
</p> 
<p> 
    I just take a 3-5 minutes break or even longer (I am the boss!), and then I again click on the launcher starting another Pomodoro and I work for another 25 minutes. You can use the same tomato icon, if you want. 
</p> 
<p> 
    <img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro.png" alt="pomodoro"> 
</p> 
<p> 
    Enjoy the awesomeness of Ubuntu and ditch Windows, yes I am an Ubuntu advocate and will push you to switch all the time <img src='http://blog.ashfame.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley'> 
</p> 

不工作:

<p> 
    <img style="margin: 0px 10px 5px 0px" src="http://ijew.com.br/wp-content/uploads/HLIC/5b8b8f82bd69fd4a78aa114fd91bd9b5.jpg" width="300" height="226"> 
</p> 
<p> 
    Hey ijews! Pessach é inesquecível! E quem pode esquecer comendo 8 dias matzá?! 
</p> 
<p> 
    Produção caseira muito bem feita. 
</p><!--more--> 
<p> 
    &nbsp; 
</p> 
<p> 
    <iframe title="YouTube video player" width="480" height="390" src="http://www.youtube.com/embed/d3D6O_sBOlc?rel=0" frameborder="0" allowfullscreen=""></iframe> 
</p> 
+0

嗨,我只是用你提到的''声明来测试你的RegExt,它把'src'分组了。你确定你没有得到错误的$匹配[1]? – Francisc 2011-04-23 13:57:25

+0

@Gordon @Pekka我同意正则表达式不是解析html代码的好选择,但我需要最小开销,所以我必须使用正则表达式来完成此操作,并且在最终发布它之前,浏览我的问题的几个匹配项。 @Francisc是的,我遇到了错误的比赛,但问题现在解决了。勒夫指出了所需的改变。 :) – Ashfame 2011-04-23 14:28:59

+0

@Ashfame对不起,但这是无稽之谈。在得出错误结论之前,对您的代码进行剖析,以确定是否有任何*重大*的开销。还要考虑一下,如果你从一开始就使用DOM,你根本就不会有这个问题。解决方案的可靠性有利于在几微秒内剔除它们并非真正需要的地方。 – Gordon 2011-04-23 14:51:35

回答

3

打开<img.+src要么<img.+?src(懒人模式),或者 - 甚至更好 - 到<img[^>]+src

+0

谢谢!这样做:) – Ashfame 2011-04-23 14:26:14