2017-08-04 62 views
0

我想比较两个XML文件并记录所有差异。问题出现时,节点开始重复。对于两个文件:比较两个XMLDocument树与重复节点

<root> 
    <a/> 
    <a/> 
    <b/> 
</root> 

和:

<root> 
    <a/> 
    <b/> 
</root> 

我的计划目前不记录任何差异。在(大和丑陋的)方法如下:

private void searchDocumentTrees (Node nodeA, Node nodeB, ArrayList<String> differences) { 
    if (nodeA.hasChildNodes() && !nodeB.hasChildNodes()) { 
     // record A deeper at this node 
     return; 
    } 
    else if (!nodeA.hasChildNodes() && nodeB.hasChildNodes()) { 
     // record B deeper at this node 
     return; 
    } 

    else if (!nodeA.hasChildNodes() && !nodeB.hasChildNodes()) { 
     return; 
    } 
    NodeList childrenA = nodeA.getChildNodes(); 
    NodeList childrenB = nodeB.getChildNodes(); 

    // indexes of nodes present in both lists of children as 
    // NodeList doesn't allow searching by value 
    ArrayList<Integer> presentInBothIndexA = new ArrayList<>(); 
    ArrayList<Integer> presentInBothIndexB = new ArrayList<>(); 

    // check for nodes present in both trees, record those present only in A 
    for (int indexA = 0; indexA < childrenA.getLength(); indexA++) { 
     boolean isPresentInBoth = false; 
     Node currentA = childrenA.item(indexA); 
     if (currentA.getNodeType() == Node.ELEMENT_NODE) { 

      for (int indexB = 0; indexB < childrenB.getLength(); indexB++) { 
       Node currentB = childrenB.item(indexB); 
       if (currentB.getNodeType() == Node.ELEMENT_NODE) { 
        // if the nodes match, record their indexes and break from inner loop 
        if (currentA.getNodeName().equals(currentB.getNodeName())) { 
         isPresentInBoth = true; 
         presentInBothIndexA.add(indexA); 
         presentInBothIndexB.add(indexB); 
         break; 
        } 
       } 
      } 

      // if the flag has not been changed currentA is not present in childrenB 
      if (!isPresentInBoth) { 
       // record as present only in A 
      } 
     } 
    } 

    // record nodes present only in B 
    for (...){ 
      /* same nested loop - this time the outer is iterating over B 
      and matching nodes indexes are not recorded - record only B - A */ 
    } 

    for (int indexBoth = 0, len = presentInBothIndexA.size(); indexBoth < len; indexBoth++) { 
     Node currentA = childrenA.item(presentInBothIndexA.get(indexBoth)); 
     Node currentB = childrenB.item(presentInBothIndexB.get(indexBoth)); 
     searchDocumentTrees(currentA,currentB,differences); 
    } 



} 

我的第一个想法是,以取代isPresentInBoth标志在这两个文件occurence的柜台,但是这可能会引入从而第三回路日益复杂,甚至更多。你有更好的主意吗?

回答

0

我发现两种解决方案:

溶液1

尝试各种(低效)后接近例如计算节点的出现次数并将它们存储在哈希表中我意识到,我拥有存储相同节点索引的结构。这是当然的:

ArrayList<Integer> presentInBothIndexA = new ArrayList<>(); ArrayList<Integer> presentInBothIndexB = new ArrayList<>();

所以,而不只是让他们挂的,我把他们的工作:

// pseudo-code for simplification 
for(nodeA in fileA) { 
    for(nodeB in fileB) { 
     // check all the aforementioned conditions 
     if(presentInBothIndexB.contains(indexB)) 
      continue; // skip if it was already recorded 
     // else, do all the other stuff - isPresentInBoth = true, and so on 

现在第二个循环不需要一个内部循环:

for (nodeB in B) { 
    if (!presentInBothIndexB.contains(indexB)) 
     //record difference - we only need to look for the nodes, that were skipped 
     //by the first loop, i.e. not present in file A 

这种方法有其缺点,因为它比较节点按照它们放在文件中的顺序,所以在这种情况下:

<r> 
    <a/> 
    <a/> 
    <a><b/></a> 
</r> 

和:

<r> 
    <a/> 
    <a/> 
</r> 

,将记录有不同数量的节点,但在第一个文件不会搜索更深。这是由于这样的事实,即在将两个节点记录为相同之后,它看起来并不会更进一步。这是一个麻烦,但我想我们可以做出这样的假设。不过,也有属性和值进行比较,整个事情变得混乱和混乱,这使宓:

溶液2 只需使用XMLUnit。认真。