如何解析这个具备java.xml.xpath XML？

我试图解析这个XML：如何解析这个具备java.xml.xpath XML？

<?xml version="1.0" encoding="UTF-8"?> 
<veranstaltungen> 
    <veranstaltung id="201611211500#25045271"> 
    <titel>Mal- und Zeichen-Treff</titel> 
    <start>2016-11-21 15:00:00</start> 
    <veranstaltungsort id="20011507"> 
     <name>Freizeitclub - ganz unbehindert </name> 
     <anschrift>Macht los e.V. 
Lipezker Straße 48 
03048 Cottbus 
</anschrift> 
     <telefon>xxxx xxxx </telefon> 
     <fax>0355 xxxx</fax> 
[...] 
</veranstaltungen>

正如你可以看到，一些文本有空格，甚至换行。我有问题，与从节点anschrift文字，因为我需要找到数据库中正确的位置数据。问题是，返回的字符串是：代替

Macht los e.V.Lipezker Straße 4803048 Cottbus

：

Macht los e.V. Lipezker Straße 48 03048 Cottbus

我知道解析它应该与normalie-space()正确的方式，但我不能完全解决如何做到这一点。我尝试这样做：

// Does not work; afaik because xpath 1 normalizes just the first node 
xPath.compile("normalize-space(veranstaltungen/veranstaltung[position()=1]/veranstaltungsort/anschrift/text()")); 

// Does not work 
xPath.compile("veranstaltungen/veranstaltung[position()=1]/veranstaltungsort[normalize-space(anschrift/text())]"));

我也试过这里给出的解决方案：xpath-normalize-space-to-return-a-sequence-of-normalized-strings

xPathExpression = xPath.compile("veranstaltungen/veranstaltung[position()=1]/veranstaltungsort"); 
NodeList result = (NodeList) xPathExpression.evaluate(doc, XPathConstants.NODESET); 

String normalize = "normalize-space(.)"; 
xPathExpression = xPath.compile(normalize); 

int length = result.getLength(); 
for (int i = 0; i < length; i++) { 
    System.out.println(xPathExpression.evaluate(result.item(i), XPathConstants.STRING)); 
}

的System.out打印：

Macht los e.V.Lipezker Straße 4803048 Cottbus

我在做什么错？

更新

我有一个解决办法了，但是这不能成为解决方案。下面的几行表明我如何把绾从类HTTPResponse：

try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) { 
    final StringBuilder stringBuilder = new StringBuilder(); 
    String    line; 

    while ((line = reader.readLine()) != null) { 
    // stringBuilder.append(line); 
    // WORKAROUND: Add a space after each line 
    stringBuilder.append(line).append(" "); 
    } 

    // Work with the red lines 
}

我宁愿有一个坚实的解决方案。

来源

2016-11-22 aProgger

'正常化空间（）'带前缘和后空白和空白字符（包括新行）其它序列转换为单个空格字符。作为你的结果不具有'anschrift'元素的文本内容的线之间的空间，必须的东西吃之前你换行*'正常化空间（）'得到完成其工作。 – Markus

本来，你似乎可以用下面的代码读取XML：

try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) { 
    final StringBuilder stringBuilder = new StringBuilder(); 
    String    line; 

    while ((line = reader.readLine()) != null) { 
    stringBuilder.append(line); 
    } 

}

这是你的新行被吃掉：readline()不不返回尾随换行符。如果然后解析stringBuilder对象的内容，你会得到一个不正确的DOM，其中的文本节点不包含从XML原来的换行。

来源

2016-11-22 10:43:48 Markus

不知道这个。谢谢你的信息。我的解决办法是然后检查是否符合一个“>”结束，如果不添加“ ”。 – aProgger

不要这样做。你正在修改输入。你为什么想做基于线条的阅读？为什么不按原样解析输入流？ – Markus

我应该让自己的头脑清醒一段时间。你是对的。现在就做这个。 – aProgger

感谢马库斯的帮助下，我才得以解决问题。原因是BufferedReader的readLine（）方法丢弃换行符。下面codesnippet对我的作品（也许可以提高）：

public Document getDocument() throws IOException, ParserConfigurationException, SAXException { 

    final HttpResponse response = getResponse(); // returns a HttpResonse 
    final HttpEntity entity = response.getEntity(); 
    final Charset  charset = ContentType.getOrDefault(entity).getCharset(); 

    // Not 100% sure if I have to close the InputStreamReader. But I guess so. 
    try (InputStreamReader isr = new InputStreamReader(entity.getContent(), charset == null ? Charset.forName("UTF-8") : charset)) { 
    return documentBuilderFactory.newDocumentBuilder().parse(new InputSource(isr)); 
    } 
}

来源

2016-11-22 12:09:54 aProgger

如何解析这个具备java.xml.xpath XML？

回答

相关问题