2008-10-27 58 views
42

有人可以提供一些代码来获取System.Xml.XmlNode实例的xpath吗?如何从XmlNode实例获取xpath

谢谢!

+0

只是澄清一下,你的意思是从根节点到节点的列表节点名称,用/分开。 – 2008-10-27 20:18:30

+0

Exatcly。所以像...... “root/mycars/toyota/description/paragraph” description元素中可能有多个段落。但我只希望xpath指向XmlNode实例所指的那个。 – joe 2008-10-27 20:28:04

+2

人们不应该只是“请求代码” - 他们应该提供一些他们至少已经尝试过的代码。 – bgmCoder 2015-01-11 16:31:20

回答

52

好吧,我忍不住去了一下。它只适用于属性和元素,但嘿...你可以在15分钟内得到什么:)同样,这可能是一种更干净的方式。

将索引包含在每个元素(特别是根元素!)中是多余的,但它比试图找出是否存在任何不明确性更容易。

using System; 
using System.Text; 
using System.Xml; 

class Test 
{ 
    static void Main() 
    { 
     string xml = @" 
<root> 
    <foo /> 
    <foo> 
    <bar attr='value'/> 
    <bar other='va' /> 
    </foo> 
    <foo><bar /></foo> 
</root>"; 
     XmlDocument doc = new XmlDocument(); 
     doc.LoadXml(xml); 
     XmlNode node = doc.SelectSingleNode("//@attr"); 
     Console.WriteLine(FindXPath(node)); 
     Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node); 
    } 

    static string FindXPath(XmlNode node) 
    { 
     StringBuilder builder = new StringBuilder(); 
     while (node != null) 
     { 
      switch (node.NodeType) 
      { 
       case XmlNodeType.Attribute: 
        builder.Insert(0, "/@" + node.Name); 
        node = ((XmlAttribute) node).OwnerElement; 
        break; 
       case XmlNodeType.Element: 
        int index = FindElementIndex((XmlElement) node); 
        builder.Insert(0, "/" + node.Name + "[" + index + "]"); 
        node = node.ParentNode; 
        break; 
       case XmlNodeType.Document: 
        return builder.ToString(); 
       default: 
        throw new ArgumentException("Only elements and attributes are supported"); 
      } 
     } 
     throw new ArgumentException("Node was not in a document"); 
    } 

    static int FindElementIndex(XmlElement element) 
    { 
     XmlNode parentNode = element.ParentNode; 
     if (parentNode is XmlDocument) 
     { 
      return 1; 
     } 
     XmlElement parent = (XmlElement) parentNode; 
     int index = 1; 
     foreach (XmlNode candidate in parent.ChildNodes) 
     { 
      if (candidate is XmlElement && candidate.Name == element.Name) 
      { 
       if (candidate == element) 
       { 
        return index; 
       } 
       index++; 
      } 
     } 
     throw new ArgumentException("Couldn't find element within parent"); 
    } 
} 
2

有没有这样的事情作为节点的“xpath”。对于任何给定的节点,可能会有很多xpath表达式匹配它。

你或许可以在树上构建表达式,它会匹配它,考虑到特定元素的索引等,但它不会是非常好的代码。

为什么你需要这个?可能有更好的解决方案。

+0

我正在调用一个XML编辑应用程序的API。我需要告诉应用程序隐藏某些节点,我通过调用带有xpath的ToggleVisibleElement来完成此操作。 我希望有一个简单的方法来做到这一点。 – joe 2008-10-27 20:26:12

20

Jon的正确,有任何数量的XPath表达式将产生实例文档中的相同节点。构建明确地产生一个特定节点的表达式最简单的方法是使用在谓词的节点位置的节点测试链,例如:

/node()[0]/node()[2]/node()[6]/node()[1]/node()[2] 

显然,这种表达不使用元素名称,但随后如果你所要做的只是在一个文档中找到一个节点,你不需要它的名字。它也不能用于查找属性(因为属性不是节点并且没有位置;只能通过名称找到它们),但它会查找所有其他节点类型。

要构建这个表情,你需要写一个返回节点在其父的子节点位置的方法,因为XmlNode不公开,作为一个属性:

static int GetNodePosition(XmlNode child) 
{ 
    for (int i=0; i<child.ParentNode.ChildNodes.Count; i++) 
    { 
     if (child.ParentNode.ChildNodes[i] == child) 
     { 
      // tricksy XPath, not starting its positions at 0 like a normal language 
      return i + 1; 
     } 
    } 
    throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property."); 
} 

(有可能是一个更优雅办法做到这一点使用LINQ,因为XmlNodeList实现IEnumerable,但我有什么,我知道这里会)

然后,你可以写这样的递归方法:

static string GetXPathToNode(XmlNode node) 
{ 
    if (node.NodeType == XmlNodeType.Attribute) 
    { 
     // attributes have an OwnerElement, not a ParentNode; also they have 
     // to be matched by name, not found by position 
     return String.Format(
      "{0}/@{1}", 
      GetXPathToNode(((XmlAttribute)node).OwnerElement), 
      node.Name 
      );    
    } 
    if (node.ParentNode == null) 
    { 
     // the only node with no parent is the root node, which has no path 
     return ""; 
    } 
    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings. 
    return String.Format(
     "{0}/node()[{1}]", 
     GetXPathToNode(node.ParentNode), 
     GetNodePosition(node) 
     ); 
} 

正如你所看到的,我也通过某种方式找到属性。

乔恩在我写我的时候滑过了他的版本。关于他的代码有些东西会让我现在有点咆哮,如果我听起来像Jon在唠叨,我会提前道歉。 (我不是,我非常肯定Jon要向我学习的东西非常短)。但是我认为,对于任何使用XML的人来说,我要说的一点非常重要,想一想。

我怀疑Jon的解决方案是从我看到很多开发者所做的事情中浮现出来的:将XML文档看作元素和属性的树。我认为这很大程度上来自主要使用XML的开发人员作为序列化格式,因为他们习惯使用的所有XML都是以这种方式构建的。您可以发现这些开发人员,因为他们交替使用术语“节点”和“元素”。这导致他们想出解决方案,将所有其他节点类型视为特殊情况。 (我自己也是这些人中的一员,很长一段时间。)

这感觉就像是一个简化的假设,而你正在做它。但事实并非如此。它使问题变得更难,代码更复杂。它会引导您绕过XML技术(如XPath中的node()函数),这些专门设计用于统一处理所有节点类型。

Jon的代码中有一个红色的标志,它会让我在代码审查中查询它,即使我不知道需求是什么,那就是GetElementsByTagName。每当我看到使用该方法时,想到的问题总是“为什么它必须是一个元素?”答案经常是“哦,这个代码是否也需要处理文本节点?”

0

这是更容易

''' <summary> 
    ''' Gets the full XPath of a single node. 
    ''' </summary> 
    ''' <param name="node"></param> 
    ''' <returns></returns> 
    ''' <remarks></remarks> 
    Private Function GetXPath(ByVal node As Xml.XmlNode) As String 
     Dim temp As String 
     Dim sibling As Xml.XmlNode 
     Dim previousSiblings As Integer = 1 

     'I dont want to know that it was a generic document 
     If node.Name = "#document" Then Return "" 

     'Prime it 
     sibling = node.PreviousSibling 
     'Perculate up getting the count of all of this node's sibling before it. 
     While sibling IsNot Nothing 
      'Only count if the sibling has the same name as this node 
      If sibling.Name = node.Name Then 
       previousSiblings += 1 
      End If 
      sibling = sibling.PreviousSibling 
     End While 

     'Mark this node's index, if it has one 
     ' Also mark the index to 1 or the default if it does have a sibling just no previous. 
     temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString() 

     If node.ParentNode IsNot Nothing Then 
      Return GetXPath(node.ParentNode) + "/" + temp 
     End If 

     Return temp 
    End Function 
3

我10便士的价值是罗伯特和科里的答案的混合体。我只能声称额外的代码行的实际打字。

private static string GetXPathToNode(XmlNode node) 
    { 
     if (node.NodeType == XmlNodeType.Attribute) 
     { 
      // attributes have an OwnerElement, not a ParentNode; also they have 
      // to be matched by name, not found by position 
      return String.Format(
       "{0}/@{1}", 
       GetXPathToNode(((XmlAttribute)node).OwnerElement), 
       node.Name 
       ); 
     } 
     if (node.ParentNode == null) 
     { 
      // the only node with no parent is the root node, which has no path 
      return ""; 
     } 
     //get the index 
     int iIndex = 1; 
     XmlNode xnIndex = node; 
     while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; } 
     // the path to a node is the path to its parent, plus "/node()[n]", where 
     // n is its position among its siblings. 
     return String.Format(
      "{0}/node()[{1}]", 
      GetXPathToNode(node.ParentNode), 
      iIndex 
      ); 
    } 
1

如果你这样做,你会得到一个路径与DER节点和位置的名称,如果你有相同的名字这样的节点: “/服务[1] /系统[1] /集团[1] /文件夹[2] /文件[2]”

public string GetXPathToNode(XmlNode node) 
{   
    if (node.NodeType == XmlNodeType.Attribute) 
    {    
     // attributes have an OwnerElement, not a ParentNode; also they have    
     // to be matched by name, not found by position    
     return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); 
    } 
    if (node.ParentNode == null) 
    {    
     // the only node with no parent is the root node, which has no path 
     return ""; 
    } 

    //get the index 
    int iIndex = 1; 
    XmlNode xnIndex = node; 
    while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name) 
    { 
     iIndex++; 
     xnIndex = xnIndex.PreviousSibling; 
    } 

    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings.   
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex); 
} 
1

我发现没有上述与XDocument工作,所以我写了我自己的代码来支持XDocument和使用递归。我认为这段代码比其他一些代码更好地处理了多个相同的节点,因为它首先尝试深入XML路径,然后备份以仅构建需要的内容。因此,如果您有/home/white/bob/home/white/mike,并且您想创建/home/white/bob/garage,代码将知道如何创建该代码。但是,我不想混淆谓词或通配符,所以我明确地禁止了这些;但很容易为它们添加支持。

Private Sub NodeItterate(XDoc As XElement, XPath As String) 
    'get the deepest path 
    Dim nodes As IEnumerable(Of XElement) 

    nodes = XDoc.XPathSelectElements(XPath) 

    'if it doesn't exist, try the next shallow path 
    If nodes.Count = 0 Then 
     NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/"))) 
     'by this time all the required parent elements will have been constructed 
     Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/")) 
     Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath) 
     Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1) 
     ParentNode.Add(New XElement(NewElementName)) 
    End If 

    'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed 
    If nodes.Count > 1 Then 
     Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.") 
    End If 

    'if there is just one element, we can proceed 
    If nodes.Count = 1 Then 
     'just proceed 
    End If 

End Sub 

Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String) 

    If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then 
     Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.") 
    End If 

    If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then 
     Throw New ArgumentException("Can't create a path based on predicates.") 
    End If 

    'we will process this recursively. 
    NodeItterate(XDoc, XPath) 

End Sub 
3

这是我用过的一个简单的方法,为我工作。

static string GetXpath(XmlNode node) 
    { 
     if (node.Name == "#document") 
      return String.Empty; 
     return GetXpath(node.SelectSingleNode("..")) + "/" + (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name; 
    } 
5

我知道,老的文章,但我喜欢的大多数(具有名称)的版本是有缺陷的: 当父节点有不同的名称节点,它停止计数指标后,最先找到的非匹配节点名称。

这里是我对它的修正版本:

有关使用类扩展
/// <summary> 
/// Gets the X-Path to a given Node 
/// </summary> 
/// <param name="node">The Node to get the X-Path from</param> 
/// <returns>The X-Path of the Node</returns> 
public string GetXPathToNode(XmlNode node) 
{ 
    if (node.NodeType == XmlNodeType.Attribute) 
    { 
     // attributes have an OwnerElement, not a ParentNode; also they have    
     // to be matched by name, not found by position    
     return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); 
    } 
    if (node.ParentNode == null) 
    { 
     // the only node with no parent is the root node, which has no path 
     return ""; 
    } 

    // Get the Index 
    int indexInParent = 1; 
    XmlNode siblingNode = node.PreviousSibling; 
    // Loop thru all Siblings 
    while (siblingNode != null) 
    { 
     // Increase the Index if the Sibling has the same Name 
     if (siblingNode.Name == node.Name) 
     { 
      indexInParent++; 
     } 
     siblingNode = siblingNode.PreviousSibling; 
    } 

    // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.   
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent); 
} 
1

什么? ;) 我的版本(建立在别人的工作)使用语法名称[索引] ...与索引omited是元素没有“兄弟”。 获取元素索引的循环在独立例程(也是类扩展)中是外部的。

刚刚过去的任何实用程序类下面(或者在程序主类)

static public int GetRank(this XmlNode node) 
{ 
    // return 0 if unique, else return position 1...n in siblings with same name 
    try 
    { 
     if(node is XmlElement) 
     { 
      int rank = 1; 
      bool alone = true, found = false; 

      foreach(XmlNode n in node.ParentNode.ChildNodes) 
       if(n.Name == node.Name) // sibling with same name 
       { 
        if(n.Equals(node)) 
        { 
         if(! alone) return rank; // no need to continue 
         found = true; 
        } 
        else 
        { 
         if(found) return rank; // no need to continue 
         alone = false; 
         rank++; 
        } 
       } 

     } 
    } 
    catch{} 
    return 0; 
} 

static public string GetXPath(this XmlNode node) 
{ 
    try 
    { 
     if(node is XmlAttribute) 
      return String.Format("{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name); 

     if(node is XmlText || node is XmlCDataSection) 
      return node.ParentNode.GetXPath(); 

     if(node.ParentNode == null) // the only node with no parent is the root node, which has no path 
      return ""; 

     int rank = node.GetRank(); 
     if(rank == 0) return String.Format("{0}/{1}",  node.ParentNode.GetXPath(), node.Name); 
     else   return String.Format("{0}/{1}[{2}]", node.ParentNode.GetXPath(), node.Name, rank); 
    } 
    catch{} 
    return ""; 
} 
1

我公司生产的VBA为Excel这样做的工作项目。它输出Xpath的元组和元素或属性的相关文本。目的是让业务分析员识别和映射一些XML。欣赏这是一个C#论坛,但认为这可能是有趣的。

Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes) 


Dim chnode As IXMLDOMNode 
Dim attr As IXMLDOMAttribute 
Dim oXString As String 
Dim chld As Long 
Dim idx As Variant 
Dim addindex As Boolean 
chld = 0 
idx = 0 
addindex = False 


'determine the node type: 
Select Case inode.NodeType 

    Case NODE_ELEMENT 
     If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes 
      oXString = iXstring & "//" & fp(inode.nodename) 
     Else 

      'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules 

      For Each chnode In inode.ParentNode.ChildNodes 
       If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1 
      Next chnode 

      If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed 
       'Lookup the index from the indexes array 
       idx = getIndex(inode.nodename, indexes) 
       addindex = True 
      Else 
      End If 

      'build the XString 
      oXString = iXstring & "/" & fp(inode.nodename) 
      If addindex Then oXString = oXString & "[" & idx & "]" 

      'If type is element then check for attributes 
      For Each attr In inode.Attributes 
       'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value 
       Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value) 
      Next attr 

     End If 

    Case NODE_TEXT 
     'build the XString 
     oXString = iXstring 
     Call oSheet(oSh, oXString, inode.NodeValue) 

    Case NODE_ATTRIBUTE 
    'Do nothing 
    Case NODE_CDATA_SECTION 
    'Do nothing 
    Case NODE_COMMENT 
    'Do nothing 
    Case NODE_DOCUMENT 
    'Do nothing 
    Case NODE_DOCUMENT_FRAGMENT 
    'Do nothing 
    Case NODE_DOCUMENT_TYPE 
    'Do nothing 
    Case NODE_ENTITY 
    'Do nothing 
    Case NODE_ENTITY_REFERENCE 
    'Do nothing 
    Case NODE_INVALID 
    'do nothing 
    Case NODE_NOTATION 
    'do nothing 
    Case NODE_PROCESSING_INSTRUCTION 
    'do nothing 
End Select 

'Now call Parser2 on each of inode's children. 
If inode.HasChildNodes Then 
    For Each chnode In inode.ChildNodes 
     Call Parse2(oSh, chnode, oXString, indexes) 
    Next chnode 
Set chnode = Nothing 
Else 
End If 

End Sub 

使用管理元素的计数:

Function getIndex(tag As Variant, indexes) As Variant 
'Function to get the latest index for an xml tag from the indexes array 
'indexes array is passed from one parser function to the next up and down the tree 

Dim i As Integer 
Dim n As Integer 

If IsArrayEmpty(indexes) Then 
    ReDim indexes(1, 0) 
    indexes(0, 0) = "Tag" 
    indexes(1, 0) = "Index" 
Else 
End If 
For i = 0 To UBound(indexes, 2) 
    If indexes(0, i) = tag Then 
     'tag found, increment and return the index then exit 
     'also destroy all recorded tag names BELOW that level 
     indexes(1, i) = indexes(1, i) + 1 
     getIndex = indexes(1, i) 
     ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it 
     Exit Function 
    Else 
    End If 
Next i 

'tag not found so add the tag with index 1 at the end of the array 
n = UBound(indexes, 2) 
ReDim Preserve indexes(1, n + 1) 
indexes(0, n + 1) = tag 
indexes(1, n + 1) = 1 
getIndex = 1 

End Function 
0

你的问题的另一种解决方案可能是“标记”,你会想以后有自定义属性标识将XMLNode:

var id = _currentNode.OwnerDocument.CreateAttribute("some_id"); 
id.Value = Guid.NewGuid().ToString(); 
_currentNode.Attributes.Append(id); 

您可以将其存储在字典中。 你可以稍后用XPath查询识别的节点:

newOrOldDocument.SelectSingleNode(string.Format("//*[contains(@some_id,'{0}')]", id)); 

我知道这是不是直接回答你的问题,但它可以帮助,如果你想的理由知道的XPath的节点是在代码中丢失对它的引用之后,有一种方法可以在'达到'节点。

这也克服了文档获取元素添加/移动时的问题,这可能会扰乱xpath(或索引,如其他答案中的建议)。

0
public static string GetFullPath(this XmlNode node) 
     { 
      if (node.ParentNode == null) 
      { 
       return ""; 
      } 
      else 
      { 
       return $"{GetFullPath(node.ParentNode)}\\{node.ParentNode.Name}"; 
      } 
     }