MalformedByteSequenceException试图解析XML

我从维基百科以下.GPX数据：MalformedByteSequenceException试图解析XML

<?xml version="1.0" encoding="UTF-8" standalone="no" ?> 
<gpx xmlns="http://www.topografix.com/GPX/1/1" creator="byHand" version="1.1" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd"> 
    <wpt lat="39.921055008" lon="3.054223107"> 
    <ele>12.863281</ele> 
    <time>2005-05-16T11:49:06Z</time> 
    <name>Cala Sant Vicenç - Mallorca</name> 
    <sym>City</sym> 
    </wpt> 
</gpx>

当我打电话给我的分析方法，我得到一个异常（见下文）。调用看起来是这样的：

Document tmpDoc = getParsedXML(currentGPX);

我的分析方法看起来像这样（标准解析代码，平平淡淡....）：

public static Document getParsedXML(String fileWithPath){ 
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
    DocumentBuilder db; 
    Document doc = null; 
    try { 
     db = dbf.newDocumentBuilder(); 
     doc = db.parse(new File(fileWithPath)); 
    } catch (ParserConfigurationException e) { 
     e.printStackTrace(); 
    } catch (SAXException e) { 
     e.printStackTrace(); 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } 
    return doc; 
    }

这个简单的代码引发以下异常：

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence. 
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) 
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) 
at Zeugs.getParsedXML(Zeugs.java:38) 
at Zeugs.main(Zeugs.java:25)

我想错误在于第一个文件的格式，但我不知道确切的位置。你能给我一个提示吗？

来源

2010-05-07 poeschlorn

是你的文件真的UTF-8编码？ – Dormilich 2010-05-07 07:42:26

如果你用'c'替换'Vicenç'中的'ç'，会发生什么？仍然有这个问题？ – Oded 2010-05-07 07:43:01

+1 Dormlich，Oded。该文件可能尚未以UTF-8格式保存。 – ChrisBD 2010-05-07 08:36:03

我建议你的文件没有以UTF-8格式保存。

来源

2010-05-07 08:35:17 ChrisBD

我在我的一个程序中有同样的错误报告。但是错误只发生在Windows控制台中运行jar的时候。在linux或eclipse中（右键单击主类文件>作为Java应用程序运行），错误未发生。

这是我猜想，因为Windows（Cp ..）与Linux和eclipse中的UTF-8设置的默认编码。要在运行的jar只需将-Dfile.encoding = UTF8参数添加到JVM

java -Dfile.encoding=UTF8 -jar myjar.jar

一个原因程序依赖于这个参数可以使用输入流时，编码并没有明确规定时，更改默认或阅读器实现。

来源

2014-07-30 18:50:38

MalformedByteSequenceException试图解析XML

回答

相关问题