如何遍历内存中的XML结构并替换子项？

说，我有一个文档 -如何遍历内存中的XML结构并替换子项？

<something> 
    <parent> 
    <child>Bird is the word 1.</child> 
    <child>Curd is the word 2.</child> 
    <child>Nerd is the word 3.</child> 
    </parent> 
    <parent> 
    <child>Bird is the word 4.</child> 
    <child>Word is the word 5.</child> 
    <child>Bird is the word 6.</child> 
    </parent> 
</something>

我想通过文件来遍历并与“狗”用XQuery和MarkLogic API的替换单词“鸟”。到目前为止，我能够实现与下面的代码 -

let $doc := $DOC 
    return <something> 
      {for $d at $y in $doc/element() 
      let $p := <parent> 
         {for $c in $d/element() 
         let $child := if(fn:matches($c, "Bird")) then(<child>{fn:replace($c, "Bird", "Dog")}</child>) else($c) 
         return $child 
         }</parent> 
      return $p} 
     </something>

结果

<something> 
    <parent> 
    <child>Dog is the word 1.</child> 
    <child>Curd is the word 2.</child> 
    <child>Nerd is the word 3.</child> 
    </parent> 
    <parent> 
    <child>Dog is the word 4.</child> 
    <child>Word is the word 5.</child> 
    <child>Dog is the word 6.</child> 
    </parent> 
</something>

我怎样才能做到这一点没有嵌套的for循环？之前曾询问过这个问题，但是使用了XSLT。

来源

2017-05-14 basari66

为什么不使用像** s/Bird/Dog/g **这样的正则表达式？它会在一次线性时间内完成。 – Wontonimo

@wontonimo虽然可以对序列化的XML进行字符串操作，但它被认为是不好的做法。确保您只在实际需要的地方应用更改也更加困难。使用单遍字符串替换时，很难确保只更改'child'元素的内容，而不更改其他元素或属性的内容。更重要的是，不会有任何混淆XML格式良好的风险，无意中重命名XML标签，或者更糟糕的是，导致它们被破坏或删除。 – grtjn

@grtjn - 同意，虽然你可以添加xml标签检查到正则表达式像这样** s /（\> [^ \ <] *）Bird（[^ \ <] * \ <）/ $ 1Dog $ 2/g **，如果您检查将**孩子**更改为**父**，则会看到它不会修改标签内部，而只会修改标签之间的单词** **。 – Wontonimo

编写一个函数并使用递归。随着typeswitch表达你可以在递归的每个阶段检查节点类型，并使用computed element constructor你可以使用一个通用的模板来重建每一个元素，而不知道它的名字：

declare function local:transform(
    $node as node() 
) as node()* 
{ 
    typeswitch ($node) 
    case element() return element { node-name($node) } { 
    $node/@*, 
    for $n in $node/node() 
    return local:transform($n) 
    } 
    case text() return 
    if (matches($node, "Bird")) 
    then text { replace($node, "Bird", "Dog") } 
    else $node 
    default return $node 
};

注意，明确地matches检查是没有必要的因为如果没有任何匹配，replace将返回输入字符串。

来源

2017-05-14 17:07:38 wst

为document-node（）添加一个大小写，并且在复制元素以实现更好的标识转换时包含$ node/namespace :: *。 – grtjn

@grtjn是的，我同意，为了简洁，我仅仅回答了这个问题。另外，对于性能至关重要的转换，除非严格需要，否则我尝试排除'namespace :: *'，因为我注意到在MarkLogic中，每个元素的通配符名称空间轴可能有点显着，这取决于其大小和内容文件。 – wst

$ node/namespace :: *只应该查看本地声明。我不得不去尝试，但如果这对性能有很大影响，我会感到惊讶。但是我会在下次玩这个游戏时记住它.. – grtjn

wst的答案看起来很不错，但同样的问题经常被问到，他们创建了一个库来使这更容易。它通常被称为“内存中更新库”。这方面的一个改进版本可以在这里找到：

https://github.com/ryanjdew/XQuery-XML-Memory-Operations

我想这可能是价值至少提它..

HTH！

来源

2017-05-14 17:33:34 grtjn

如何遍历内存中的XML结构并替换子项？

回答

相关问题