对json或hash的XML api响应？

所以，我正在使用一个只发生XML返回的API。我想要做的是为每个从API返回的记录创建一个数据库条目，但我不知道如何。对json或hash的XML api响应？

被返回的XML是巨大的，有很多在它的空白字符......是正常的吗？以下是一些XML的示例。

<!-- ... --> 
     <attribute name="item_date">May 17, 2011</attribute> 
     <attribute name="external_url">http://missionlocal.org/2011/05/rain-camioneta-part-i/</attribute> 
      <attribute name="source" id="2478">Mission [email protected]</attribute> 
      <attribute name="excerpt"></attribute> 
    </attributes> 
</newsitem> 

<newsitem 
    id="5185807" 
    title="Lost Chrome messenger PBR bag and contents (marina/cow hollow)" 
    url="http://sf.everyblock.com/lost-and-found/by-date/2011/5/17/5185807/" 
    location_name="Van Ness and Filbert" 
    schema="lost-and-found" 
    schema_id="7" 
    pub_date="May 17, 2011, 12:15 p.m." 
    longitude="-122.424129925" 
    latitude="37.7995100578" 
> 
    <attributes> 
     <attribute name="item_date">May 17, 2011</attribute> 
     <attribute name="external_url">http://sfbay.craigslist.org/sfc/laf/2386709187.html</attribute> 
    </attributes> 
</newsitem> 

<newsitem 
    id="5185808" 
    title="Plywood Update: Dumplings &amp; Buns Aims To Be &quot;Beard Papa Of Chinese Buns&quot;" 
    url="http://sf.everyblock.com/news-articles/by-date/2011/5/17/5185808/" 
    location_name="2411 California Street" 
    schema="news-articles" 
    schema_id="5" 
    pub_date="May 17, 2011, 12:15 p.m." 
    longitude="-122.434000442" 
    latitude="37.7888985667" 
> 
    <attributes> 
     <attribute name="item_date">May 17, 2011</attribute> 
     <attribute name="external_url">http://sf.eater.com/archives/2011/05/17/dumplings_buns_aims_to_be_beard_papa_of_chinese_buns.php</attribute> 
      <attribute name="source" id="2155">Eater SF</attribute> 
      <attribute name="excerpt"></attribute> 
    </attributes> 
</newsitem> 

<newsitem 
    id="5185809" 
    title="Freebies: This week, Piazza D&#39;Angelo (22 Miller..." 
    url="http://sf.everyblock.com/news-articles/by-date/2011/5/17/5185809/" 
    location_name="22 Miller" 
    schema="news-articles" 
    schema_id="5" 
    pub_date="May 17, 2011, 12:15 p.m." 
    longitude="-122.408894997" 
    latitude="37.7931966922" 
> 
    <attributes> 
     <attribute name="item_date">May 17, 2011</attribute> 
     <attribute name="external_url">http://sf.eater.com/archives/2011/05/17/freebies_24.php</attribute> 
      <attribute name="source" id="2155">Eater F</attribute> 
      <attribute name="excerpt"></attribute> 
<!-- ... -->

任何想法？

来源

2011-05-17 JP Silvashy

嘿肯感谢帮助重新格式化和清理，但问题的一部分是，输出是如此混乱和无格式。也许把它发送给一个XML解析器可以缓解这种情况，但我不知道如何逐个解析它并将它插入到数据库中。 – 2011-05-18 01:02:50

这不是很有效的XML。这是某种XML的转义字符串表示，也许是控制台输出。它似乎也不完整。除此之外，这是相当普通的XML。这里有一个小摘录，转义和格式化：

<newsitem 
    id="5185807" 
    title="Lost Chrome messenger PBR bag and contents (marina/cow hollow)" 
    url="http://sf.everyblock.com/lost-and-found/by-date/2011/5/17/5185807/" 
    location_name="Van Ness and Filbert" 
    schema="lost-and-found" 
    schema_id="7" 
    pub_date="May 17, 2011, 12:15 p.m." 
    longitude="-122.424129925" 
    latitude="37.7995100578"> 
    <attributes> 
     <attribute name="item_date">May 17, 2011</attribute> 
     <attribute name="external_url">http://sfbay.craigslist.org/sfc/laf/2386709187.html</attribute> 
    </attributes> 
</newsitem>

你只需要确定你要提取，放入数据库中，并让驱动您的数据库设计决定的。你是否需要多个关系完整的模型，或者你只关心数据的一个子集？

来源

2011-05-18 00:31:18

我想我需要的关系完好，所以它会是几个不同的模型，我实际上使用mongo，所以我可以处理所有这些，除非你有一个很好的解决方案:)我最大的担忧是实际上我可以通过我的响应身体来获得某些对象，我可以使用ruby更可靠地管理哪些方法。 – 2011-05-18 00:54:48

XML可以有空白和不影响它包含的数据的质量。一个好的解析器，就是你应该如何处理XML，不会在乎，并且会让你访问数据，不管是否有空白。

Nokogiri是最爱我，似乎是Ruby的事实上的标准，现在天。它非常易于使用，但您必须学会如何告诉它您想要的节点。

，让你去，这里是一些术语：

节点是一个标签来看，它已经被解析之后。
节点具有属性，可以使用node_var['attribute']访问节点。
节点文本可以使用node_var.text或node_var.content或node_var.inner_text访问。
NodeSet基本上是一个节点数组。
at返回与您给解析器的访问器匹配的第一个节点。 %是别名。
search返回与您给解析器的访问器匹配的节点的NodeSet。 /是别名。

下面是我们如何能够解析XML的片段：

require 'nokogiri' 

xml =<<EOT 
<newsitem 
    id="5185807" 
    title="Lost Chrome messenger PBR bag and contents (marina/cow hollow)" 
    url="http://sf.everyblock.com/lost-and-found/by-date/2011/5/17/5185807/" 
    location_name="Van Ness and Filbert" 
    schema="lost-and-found" 
    schema_id="7" 
    pub_date="May 17, 2011, 12:15 p.m." 
    longitude="-122.424129925" 
    latitude="37.7995100578"> 
    <attributes> 
     <attribute name="item_date">May 17, 2011</attribute> 
     <attribute name="external_url">http://sfbay.craigslist.org/sfc/laf/2386709187.html</attribute> 
    </attributes> 
</newsitem> 
EOT 

doc = Nokogiri::XML(xml) 
doc.at('newsitem').text # => "\n \n  May 17, 2011\n  http://sfbay.craigslist.org/sfc/laf/2386709187.html\n \n" 
(doc % 'attribute').content # => "May 17, 2011" 
doc.at('attribute[name="external_url"]').inner_text # => "http://sfbay.craigslist.org/sfc/laf/2386709187.html" 

doc.at('newsitem')['id'] # => "5185807" 

newsitem = doc.at('newsitem') 
newsitem['title'] # => "Lost Chrome messenger PBR bag and contents (marina/cow hollow)" 

attributes = doc.search('attribute').map{ |n| n.text } 
attributes # => ["May 17, 2011", "http://sfbay.craigslist.org/sfc/laf/2386709187.html"] 

attributes = (doc/'attribute').map{ |n| n.text } 
attributes # => ["May 17, 2011", "http://sfbay.craigslist.org/sfc/laf/2386709187.html"]

所有访问都使用CSS，就像写网页时，你会使用。它更简单，通常更清晰，但Nokogiri也支持XPath，它非常强大，可以让您将大量处理工作转移到底层libXML2库中，后者运行速度非常快。

引入nokogiri作品非常精美，Ruby的Open-URI，所以如果你从一个网站检索XML，你可以做这样的：

require 'open-uri' 
require 'nokogiri' 

doc = Nokogiri::HTML(open('http://www.example.com')) 
doc.to_html.size # => 2825

这是解析HTML，它引入nokogiri擅长太，但过程与XML相同，只需将Nokogiri::HTML替换为Nokogiri::XML即可。

也参见“How to avoid joining all text from Nodes when scraping”。

来源

2011-05-18 03:08:49

对json或hash的XML api响应？

回答

相关问题