2017-05-30 100 views
2

我正尝试使用php阅读RSS源。由于某些原因,它无法读取此内容标签。php阅读RSS源无法阅读<a10:content type =“text/xml”>标记

<a10:content type="text/xml">...</a10:content> 

这是一个什么样的项目可能看起来像

<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom"> 
    <channel> 
     <title>mMin title</title> 
     <description>Some description</description> 
     <managingEditor>[email protected]</managingEditor> 
     <category>Some category</category> 
     <item> 
      <guid isPermaLink="false">1</guid> 
      <link>https://example.com/1</link> 
      <title>Some title 1</title> 
      <a10:updated>2017-05-30T13:20:22+02:00</a10:updated> 
      <a10:content type="text/xml"> 
       <Location>San diego</Location> 
       <PublishedOn>2016-10-21T11:21:07</PublishedOn> 
       <Body>Lorem ipsum dolar</Body> 
       <JobCountry>USA</JobCountry> 
      </a10:content> 
     </item> 
     <item> 
      <guid isPermaLink="false">1</guid> 
      <link>https://example.com/2</link> 
      <title>Some title 2</title> 
      <a10:updated>2017-05-30T13:20:22+02:00</a10:updated> 
      <a10:content type="text/xml"> 
       <Location>Detroit</Location> 
       <PublishedOn>2016-10-21T11:21:07</PublishedOn> 
       <Body>Lorem ipsum dolar</Body> 
       <JobCountry>USA</JobCountry> 
      </a10:content> 
     </item> 
     <item> 
      <guid isPermaLink="false">1</guid> 
      <link>https://example.com/3</link> 
      <title>Some title 3</title> 
      <a10:updated>2017-05-30T13:20:22+02:00</a10:updated> 
      <a10:content type="text/xml"> 
       <Location>Los Angeles</Location> 
       <PublishedOn>2016-10-21T11:21:07</PublishedOn> 
       <Body>Lorem ipsum dolar</Body> 
       <JobCountry>USA</JobCountry> 
      </a10:content> 
     </item> 
    </channel> 
</rss> 

这里是我的代码的例子。

$url = "http://example.com/RSSFeed"; 
    $xml = simplexml_load_file($url); 

    foreach ($xml->channel as $x) { 
     foreach ($x->item as $item) { 

      dd($item); 
     } 
    } 

,输出

SimpleXMLElement {#111 ▼ 
     +"guid": "1" 
     +"link": "https://example.com" 
     +"title": "Some title" 
    } 

这是我期望的输出

SimpleXMLElement {#111 ▼ 
    +"guid": "1" 
    +"link": "https://example.com" 
    +"title": "Some title" 
    +"content" { 
    0 => { 
     +"Location": "San Diego" 
     +"PublishedOn": "2016-10-21T11:21:07" 
     +"Body": "Lorem ipsum dolar" 
     +"JobCountry": "USA" 
    } 
    1 => { 
     +"Location": "Detroit" 
     +"PublishedOn": "2016-10-21T11:21:07" 
     +"Body": "Lorem ipsum dolar" 
     +"JobCountry": "USA" 
    } 
    2 => { 
     +"Location": "Los Angeles" 
     +"PublishedOn": "2016-10-21T11:21:07" 
     +"Body": "Lorem ipsum dolar" 
     +"JobCountry": "USA" 
    } 
    } 
} 

任何人有一个解决方案?

+0

您完整的XML? –

+0

@SahilGulati我更新了XML –

回答

1

您应该使用命名空间进行访问。这里我们使用DOMDocument来实现所需的输出。 DOMDocument功能getElementsByTagNameNS,在此我们通过namespace uri及其所需内容。这样可以达到预期的产出。

如果你喜欢使用simplexml_load_string你可以检查一下。 PHP code demo

Try this code snippet here

<?php 

ini_set('display_errors', 1); 

libxml_use_internal_errors(true); 
$string=<<<HTML 
<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom"> 
    <channel> 
     <title>mMin title</title> 
     <description>Some description</description> 
     <managingEditor>[email protected]</managingEditor> 
     <category>Some category</category> 
     <item> 
      <guid isPermaLink="false">1</guid> 
      <link>https://example.com</link> 
      <title>Some title</title> 
      <a10:updated>2017-05-30T13:20:22+02:00</a10:updated> 
      <a10:content type="text/xml"> 
       <Location>Detroit</Location> 
       <PublishedOn>2016-10-21T11:21:07</PublishedOn> 
       <Body>Lorem ipsum dolar</Body> 
       <JobCountry>USA</JobCountry> 
      </a10:content> 
     </item> 
    </channel> 
</rss> 
HTML; 
$data=array(); 
$completeData=array(); 
$domDocument = new DOMDocument(); 
$domDocument->loadXML($string); 
$results=$domDocument->getElementsByTagNameNS("http://www.w3.org/2005/Atom", "content"); 
foreach($results as $result) 
{ 
    if($result instanceof DOMElement && $result->tagName=="a10:content") 
    { 
     foreach($result->childNodes as $node) 
     { 
      if($node instanceof DOMElement) 
      { 
       $data[]=$node->nodeValue; 
      } 
     } 
    } 
    $completeData[]=$data; 
} 
print_r($completeData); 
+1

很好的答案,除非你没有解释他的问题是什么。 – delboy1978uk

+0

@ delboy1978uk当然,我解释它 –

+0

@SahilGulati这里的问题是,我需要它作为一个数组与几个项目,而不是键值对。 –

0

首先,不要使用简单的XML,它是扯淡!使用DOMDocument会更好。

http://php.net/manual/en/class.domdocument.php

<?php 

$dom = new DOMDocument(); 
$dom->loadXML($xml); 


$items = $dom->getElementsByTagName('item'); 
$array = array(); 

foreach($items as $item) 
{ 
    $title = $item->getElementsByTagName('title')->item(0)->nodeValue; 
    $link = $item->getElementsByTagName('link')->item(0)->nodeValue; 
    $updated = $item->getElementsByTagName('updated')->item(0)->nodeValue; 
    $location = $item->getElementsByTagName('Location')->item(0)->nodeValue; 
    $pub = $item->getElementsByTagName('PublishedOn')->item(0)->nodeValue; 
    $body = $item->getElementsByTagName('Body')->item(0)->nodeValue; 
    $job = $item->getElementsByTagName('JobCountry')->item(0)->nodeValue; 

    $array[] = [ 
     'title' => $title, 
     'link' => $link, 
     'updated' => $updated, 
     'Location' => $location, 
     'PublishedOn' => $pub, 
     'Body' => $body, 
     'JobCountry' => $job, 
    ]; 
} 

var_dump($array); 

这将gvie ytou这样的:

array(7) { ["title"]=> string(12) "Some title 1" ["link"]=> string(21) "https://example.com/1" ["updated"]=> string(25) "2017-05-30T13:20:22+02:00" ["Location"]=> string(9) "San diego" ["PublishedOn"]=> string(19) "2016-10-21T11:21:07" ["Body"]=> string(17) "Lorem ipsum dolar" ["JobCountry"]=> string(3) "USA" } 

看这里! https://3v4l.org/E0UXJ

现在它的工作原理,让我们通过创建一个方便的功能优化它:

function domToArray($item, array $cols) 
{ 
    $array = []; 
    foreach ($cols as $col) { 
     $val = $item->getElementsByTagName($col)->item(0)->nodeValue; 
     $array[$col] = $val; 
    } 
    return $array; 
} 

$dom = new DOMDocument(); 
$dom->loadXML($xml); 

$items = $dom->getElementsByTagName('item'); 
$array = array(); 

$fields = [ 
     'title', 
     'link', 
     'updated', 
     'Location', 
     'PublishedOn', 
     'Body', 
     'JobCountry', 
    ]; 

foreach($items as $item) 
{ 
    $array[] = domToArray($item, $fields); 
} 

var_dump($array); 

的输出结果相同,在这里看到https://3v4l.org/W6HM3

+0

@ delboy1987uk有几个项目,我需要他们作为一个数组。 –

+0

我想每个项目作为一个对象。并非所有的东西都是平面阵列。 –

+0

正在更新!支持 :-) – delboy1978uk

1

这里是你可以分享我的工作液

$xml = file_get_contents("https://example.com/RSSFeed"); 

$string = str_replace(array("<a10:content","</a10:content>"), array("<content","</content>"), $xml); 

$sxe = new \SimpleXMLElement($string); 

$jobs = array(); 

foreach ($sxe as $item) { 

    dd($item); 

}