我从维基百科以下.GPX数据:MalformedByteSequenceException试图解析XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" creator="byHand" version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
<wpt lat="39.921055008" lon="3.054223107">
<ele>12.863281</ele>
<time>2005-05-16T11:49:06Z</time>
<name>Cala Sant Vicenç - Mallorca</name>
<sym>City</sym>
</wpt>
</gpx>
当我打电话给我的分析方法,我得到一个异常(见下文)。调用看起来是这样的:
Document tmpDoc = getParsedXML(currentGPX);
我的分析方法看起来像这样(标准解析代码,平平淡淡....):
public static Document getParsedXML(String fileWithPath){
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
Document doc = null;
try {
db = dbf.newDocumentBuilder();
doc = db.parse(new File(fileWithPath));
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return doc;
}
这个简单的代码引发以下异常:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at Zeugs.getParsedXML(Zeugs.java:38)
at Zeugs.main(Zeugs.java:25)
我想错误在于第一个文件的格式,但我不知道确切的位置。 你能给我一个提示吗?
是你的文件真的UTF-8编码? – Dormilich 2010-05-07 07:42:26
如果你用'c'替换'Vicenç'中的'ç',会发生什么?仍然有这个问题? – Oded 2010-05-07 07:43:01
+1 Dormlich,Oded。该文件可能尚未以UTF-8格式保存。 – ChrisBD 2010-05-07 08:36:03