应该使用什么命令从linux上的XML文件的一行中获取文本？

我有一个文本文件，并从该文件我必须得到具体的文字。应该使用什么命令来获取它？应该使用什么命令从linux上的XML文件的一行中获取文本？

例如文件的文本如下：

<name>this is first line</name> 
<name>this is second line</name> 
<name>this is third line</name>

我必须从这些标签只得到文本，即我需要“这是第一行”。

来源

2011-04-05 balaji

红宝石（1.9+）

$ ruby -ne 'puts $_.scan(/<name>(.*?)<\/name>/)' file 
this is first line 
this is second line 
this is third line

AWK

$ awk 'BEGIN{ RS="</name>" }/<name>/{ gsub(/.*<name>/,"");print }' file 
this is first line 
this is second line 
this is third line

sed的

$ sed -r 's|<name>(.[^>]*)</name>|\1|' file 
this is first line 
this is second line 
this is third line

来源

2011-04-05 08:32:07 kurumi

我不会推荐基于regexp的抽取任何结构化的东西（比如XML）和示例代码片段，就是这样。 – 2011-04-05 08:42:04

我不同意。有时候，要求很简单。 – kurumi 2011-04-05 09:01:40

-1

这是否适合您？（不知道你的需要）：

cat yourfile | grep "this is first line"

来源

2011-04-05 07:49:24

没用使用cat' – kurumi 2011-04-05 08:30:50

@Kurumi的'：如何你读的文件内容呢？ – 2011-04-05 09:48:30

看到Draco的回答 – kurumi 2011-04-05 09:49:43

grep将帮助你找到正确的线。如果定期格式化，也许可以使用cut删除<name>标签？如果不是，那么sed可能是该工作的正确工具。

来源

2011-04-05 07:49:40

假设它实际上是一个完整的XML文档，你可以（应该）更喜欢

xmllint -xpath '//name/text()' test.xml

或者，如果你想有换行符，可以

xsltproc.exe trafo.xslt test.xml

与像

trafo.xslt

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="1.0" 
       xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="html" indent="yes"/> 
    <xsl:strip-space elements="*"/> 
    <xsl:template match="/"> 
     <xsl:for-each select="//name[text()]"> 
      <xsl:if test="text()"> 
       <xsl:value-of select="text()"/> 
       <xsl:text>&#x0a;</xsl:text> 
      </xsl:if> 
     </xsl:for-each> 
    </xsl:template> 
</xsl:stylesheet>

来源

2011-04-05 08:00:39 sehe

我推荐这个，因为它实际上使用这个片段可能具有可利用的结构的事实。 – 2011-04-05 08:42:47

这不会在匹配之间打印换行符。 – 2011-04-05 14:32:03

添加了替代 – sehe 2011-04-05 14:44:07

我相信你需要<name>标签里面的所有文字标签每行1行。

grep -Po "(?<=<name>)[^<]*(?=</name>)" yourfile

其结果将是

this is first line 
this is second line 
this is third line

来源

2011-04-05 08:12:53

+1，但只有在文档不是格式良好的XML（例如，你没有围绕整个文档的根标签），并且你确定''标签中永远不会有属性，并且永远不会有嵌套标签。如果你有良好的XML，请使用sehe的答案。 – 2011-04-05 14:19:49

Sehe的答案不会在行之间添加换行符。我建议使用以下代替：

xmlstarlet sel -t -m '//name/text()' -v '.' -n test.xml 
#    ^^^^^^^^^^^^^^^^^^^^^ ^^^^^^ ^^^ 
#    for each xpath match |  | 
#       print the result | 
#       followed by a newline

或

xmlstarlet sel -t -m '//name' -v 'text()' -n test.xml 
#    ^^^^^^^^^^^^^ ^^^^^^^^^^^ ^^^ 
#   for each name tag  |  | 
# print the text that's inside it  | 
#       followed by a newline

（他们的行为有点不同就在那里打印新行）

来源

2011-04-05 14:25:20

谢谢指点我在xmlstarlet，我以前没有用过 – sehe 2011-04-05 14:44:40

应该使用什么命令从linux上的XML文件的一行中获取文本？

回答

相关问题