0
可能重复:
Grabbing the href attribute of an A element
Best methods to parse HTML正则表达式在PHP从HTML代码中提取图像URL
我一直用这个代码来提取HTML代码的图像在PHP中:
$output = preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $content, $matches);
if ($output > 0) echo $matches[1][0];
它一直在工作国王对我来说一直都很好,但是它对特定的HTML代码行为不端。我没有很好的正则表达式,所以需要帮助解决这个问题。
作品为:
<p>
I finally decided to try Pomodoro technique to see how well it can improve my productivity as I am a lot disorganised, lazy sorta geek (well who isn’t?). So I built up a small script which acts as a Pomodoro timer for me using <a href="http://blog.ashfame.com/2011/04/ubuntu-notification-system/">Ubuntu notification system</a> (Do read it if you haven’t, you need to install lib-notify package for this script to work).
</p>
<p>
I have created a launcher in my top panel, with which I start a new <em>pomodori</em> (name for a new period of time, lets call it a Pomodoro anyway). It calls up the script which alerts me that a new Pomodoro (time period) has started and then alert me again when the timer ends and I should take a small break.
</p>
<p>
Here is the script:
</p>
<pre class="brush: bash; title: ; toolbar: false;" title="">
DISPLAY=:0 notify-send -t 1000 -i /home/ashfame/Dropbox/Ubuntu/icons/pomodoro.png "New Pomodoro starts" "You have 25 minutes to work."# 25 minutes timersleep 1500DISPLAY=:0 notify-send -t 1000 -i /home/ashfame/Dropbox/Ubuntu/icons/pomodoro.png "Pomodoro ends" "Take a break!"
</pre>
<p>
As soon as I click the launcher, the first notification appears telling me that a new Pomodoro has started.
</p>
<p>
<img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro-starts.png" alt="pomodoro starts">
</p>
<p>
Then it sleeps for 1500 secs = 25 minutes. And after that the second notification appears telling me that the Pomodoro has ended.
</p>
<p>
<img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro-ends.png" alt="pomodoro ends">
</p>
<p>
I just take a 3-5 minutes break or even longer (I am the boss!), and then I again click on the launcher starting another Pomodoro and I work for another 25 minutes. You can use the same tomato icon, if you want.
</p>
<p>
<img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro.png" alt="pomodoro">
</p>
<p>
Enjoy the awesomeness of Ubuntu and ditch Windows, yes I am an Ubuntu advocate and will push you to switch all the time <img src='http://blog.ashfame.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley'>
</p>
不工作:
<p>
<img style="margin: 0px 10px 5px 0px" src="http://ijew.com.br/wp-content/uploads/HLIC/5b8b8f82bd69fd4a78aa114fd91bd9b5.jpg" width="300" height="226">
</p>
<p>
Hey ijews! Pessach é inesquecível! E quem pode esquecer comendo 8 dias matzá?!
</p>
<p>
Produção caseira muito bem feita.
</p><!--more-->
<p>
</p>
<p>
<iframe title="YouTube video player" width="480" height="390" src="http://www.youtube.com/embed/d3D6O_sBOlc?rel=0" frameborder="0" allowfullscreen=""></iframe>
</p>
嗨,我只是用你提到的''声明来测试你的RegExt,它把'src'分组了。你确定你没有得到错误的$匹配[1]? – Francisc 2011-04-23 13:57:25
@Gordon @Pekka我同意正则表达式不是解析html代码的好选择,但我需要最小开销,所以我必须使用正则表达式来完成此操作,并且在最终发布它之前,浏览我的问题的几个匹配项。 @Francisc是的,我遇到了错误的比赛,但问题现在解决了。勒夫指出了所需的改变。 :) – Ashfame 2011-04-23 14:28:59
@Ashfame对不起,但这是无稽之谈。在得出错误结论之前,对您的代码进行剖析,以确定是否有任何*重大*的开销。还要考虑一下,如果你从一开始就使用DOM,你根本就不会有这个问题。解决方案的可靠性有利于在几微秒内剔除它们并非真正需要的地方。 – Gordon 2011-04-23 14:51:35