2016-08-22 124 views
0

我想创建一个简单的RSS饲料网站。
我可以得到一些RSS源通过只是在做这样的:大多数RSS的如何从xml文件中获取<img> src值?

let article = { 
       'title': item.title, 
       'image': item.image.url, 
       'link': item.link, 
       'description': item.description, 
      } 

标题和链接工作饲料,但图像和说明不。
由于大量的RSS费有这样的描述的内部形象为HTML:

{ title: 'The Rio Olympics Are Where TV Finally Sees the Future', 
description: '<div class="rss_thumbnail"><img src="http://www.wired.com/wp-content/uploads/2016/08/GettyImages-587338962-660x435.jpg" alt="The Rio Olympics Are Where TV Finally Sees the Future" /></div>Time was, watching the Olympics just meant turning on your TV. That\'s changed—and there\'s no going back. The post <a href="http://www.wired.com/2016/08/rio-olympics-tv-finally-sees-future/">The Rio Olympics Are Where TV Finally Sees the Future</a> appeared first on <a href="http://www.wired.com">WIRED</a>.',... 

我怎样才能获得图像的URL从它?

编辑:

http.get("http://www.wired.com/feed/"... 

    .on('readable', function() { 
     let stream = this; 
     let item; 
     while(item = stream.read()){ 
      let article = { 
       'title': item.title, 
       'image': item.image.url, 
       'link': item.link, 
       'description': item.description, 
      } 
      news.push(article); 
     } 
    }) 

这是我的一些代码,基本上我试图摆脱有线RSS图像的URL。
如果我用户'图像':item.image.url,它不起作用。那么我应该怎样改变它呢?

回答

1

使用xml2js转换XML到JSON

var parseString = require('xml2js').parseString; 

var xml = '<img title=\'A San Bernardino County Fire Department firefighter watches a helitanker make a water drop on a wildfire, seen from Cajon Boulevard in Devore, Calif., Thursday, Aug. 18, 2016. (David Pardo/The Daily Press via AP)\' height=\'259\' alt=\'APTOPIX California Wildfires\' width=\'460\' src=\'http://i.cbc.ca/1.3730399.1471835992!/cpImage/httpImage/image.jpg_gen/derivatives/16x9_460/aptopix-california-wildfires.jpg\' />'; 

parseString(xml, function (err, result) { 
    console.log(JSON.stringify(result, null, 4)); 
    console.log(result["img"]["$"]["src"]); 
}); 
+0

我试过你的答案,但没有奏效。我编辑并添加了一些我的代码。 – Dan

+0

@Dan对不起,你的代码失败了....代码失败了......它应该从字符串中获取'url'...你能告诉你在它里面做了什么改变.... –

-1

您可以使用DOMDocument解析器来获取图像源。

$html = "<img title=\'A San Bernardino County Fire Department firefighter watches a helitanker make a water drop on a wildfire, seen from Cajon Boulevard in Devore, Calif., Thursday, Aug. 18, 2016. (David Pardo/The Daily Press via AP)\' height=\'259\' alt=\'APTOPIX California Wildfires\' width=\'460\' src=\'http://i.cbc.ca/1.3730399.1471835992!/cpImage/httpImage/image.jpg_gen/derivatives/16x9_460/aptopix-california-wildfires.jpg\' />"; 

$doc = new DOMDocument(); 
$doc->loadHTML($html); 
$xpath = new DOMXPath($doc); 
$src = $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg" 
+0

哪里OP说希望PHP? –

0

字符串的正则表达式:

var res = description.match(/src=.*\.(jpg|jpeg|png|gif)/gi); 

Fiddle Demo

+0

我试过你的答案,但没有奏效。我编辑并添加了一些我的代码。 – Dan

0

一个想法是使用正则表达式。对于前:

var re = /(src=)(\\'htt.*\\')/g 
var img_string = "your image tag string" 
var match = re.exec(img_string) 
var result = match[1]