同一个多方法的不同方法之间的递归

我正在编写一个Clojure库来解析Mac OS X的基于XML的property list files。代码工作正常，除非你给它一个大的输入文件，在这一点你得到java.lang.OutOfMemoryError: Java heap space。同一个多方法的不同方法之间的递归

下面是一个例子输入文件（足够小，做工精细）：

<plist version="1.0"> 
<dict> 
    <key>Integer example</key> 
    <integer>5</integer> 
    <key>Array example</key> 
    <array> 
     <integer>2</integer> 
     <real>3.14159</real> 
    </array> 
    <key>Dictionary example</key> 
    <dict> 
     <key>Number</key> 
     <integer>8675309</integer> 
    </dict> 
</dict> 
</plist>

clojure.xml/parse变成这样：

{:tag :plist, :attrs {:version "1.0"}, :content [ 
    {:tag :dict, :attrs nil, :content [ 
     {:tag :key, :attrs nil, :content ["Integer example"]} 
     {:tag :integer, :attrs nil, :content ["5"]} 
     {:tag :key, :attrs nil, :content ["Array example"]} 
     {:tag :array, :attrs nil, :content [ 
      {:tag :integer, :attrs nil, :content ["2"]} 
      {:tag :real, :attrs nil, :content ["3.14159"]} 
     ]} 
     {:tag :key, :attrs nil, :content ["Dictionary example"]} 
     {:tag :dict, :attrs nil, :content [ 
      {:tag :key, :attrs nil, :content ["Number"]} 
      {:tag :integer, :attrs nil, :content ["8675309"]} 
     ]} 
    ]} 
]}

我的代码变成Clojure的数据结构

{"Dictionary example" {"Number" 8675309}, 
"Array example" [2 3.14159], 
"Integer example" 5}

这

我的代码的相关部分看起来像

; extract the content contained within e.g. <integer>...</integer> 
(defn- first-content 
    [c] 
    (first (c :content))) 

; return a parsed version of the given tag 
(defmulti content (fn [c] (c :tag))) 

(defmethod content :array 
    [c] 
    (apply vector (for [item (c :content)] (content item)))) 

(defmethod content :dict 
    [c] 
    (apply hash-map (for [item (c :content)] (content item)))) 

(defmethod content :integer 
    [c] 
    (Long. (first-content c))) 

(defmethod content :key 
    [c] 
    (first-content c)) 

(defmethod content :real 
    [c] 
    (Double. (first-content c))) 

; take a java.io.File (or similar) and return the parsed version 
(defn parse-plist 
    [source] 
    (content (first-content (clojure.xml/parse source))))

代码的肉是content函数，该函数是一个派生在：tag（XML标记的名称）上的多方法。我想知道是否有什么不同，我应该做的，以使这种递归更好。我试图用trampoline content替换所有三个电话content，但那不起作用。我应该怎么做才能让这种相互递归更高效地工作？或者我采取了一个根本错误的方法？

编辑：顺便说一句，这个代码是available on GitHub，在这种形式可能更容易玩弄。

来源

2011-02-01 bdesham

你有多个（每个孩子一个）来自单一方法的递归调用，所以你的代码不是（并且不能没有重度reorg）尾递归。 trampoline旨在相互作用尾递归函数。

你的大型XML文件有多深？我在问，因为你得到的是一个OOM不是一个SO。

无论如何，为了解决你的递归问题（这不太可能是引发异常的问题），你必须走下你的XML数据结构（例如用xml-zip），同时维护代表结果树的栈（向量或列表）施工。具有讽刺意味的是，XML数据结构的遍历与用于构建结构的sax事件有些相同。

来源

2011-02-02 06:55:56 cgrand

我还没有听说过xml-zip，但我会研究它。谢谢！ – bdesham 2011-02-03 19:18:36

重递归会导致StackOverflowException，而不是OutOfMemoryError。此外，递归在这里似乎并不深（根据您的示例中的XML文件，只有3个级别）。

我的猜测是，OutOfMemoryError正在被抛出，因为您的大型XML文件被解析到的数据结构太大，无法放入JVM堆。您可以尝试使用-Xms和-Xmx选项增加堆大小。但是，解析大型XML文件的正确方法是使用SAX事件而不是构建树（DOM或Clojure数据结构）。

来源

2011-02-02 06:39:32

实际的文件当然要大得多，但在例子中，递归仍然不是那么深。我会研究SAX事件和xml-zip，看看这个库最有意义。谢谢！ – bdesham 2011-02-03 19:19:48

同一个多方法的不同方法之间的递归

回答

相关问题