2012-07-16 79 views
0

嗨,在我的代码中运行以下代码,但是这段代码在执行过程中崩溃了。java htmlcleaner在清理过程中崩溃

ByteArrayInputStream input = new ByteArrayInputStream(fileContent); 

final HtmlCleaner cleaner = new HtmlCleaner(); 
CleanerProperties props = cleaner.getProperties(); 

DomSerializer doms = new DomSerializer(props, true); 

org.w3c.dom.Document xmlDoc = null; 

try { 
    TagNode node = cleaner.clean(input); 
    xmlDoc = doms.createDOM(node); 
} catch (Exception e) { 
    System.out.println("Tiding error "); 
    e.printStackTrace(); 
} 

这是错误的堆栈跟踪:

NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces. 
    at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.checkDOMNSErr(CoreDocumentImpl.java:2535) 
    at com.sun.org.apache.xerces.internal.dom.AttrNSImpl.setName(AttrNSImpl.java:113) 
    at com.sun.org.apache.xerces.internal.dom.AttrNSImpl.<init>(AttrNSImpl.java:74) 
    at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.createAttributeNS(CoreDocumentImpl.java:2138) 
    at com.sun.org.apache.xerces.internal.dom.ElementImpl.setAttributeNS(ElementImpl.java:656) 
    at org.htmlcleaner.DomSerializer.setAttributes(DomSerializer.java:97) 
    at org.htmlcleaner.DomSerializer.createDOM(DomSerializer.java:37) 

任何人的帮助可以找出为什么它的发生?

真诚,佐利

回答

0

HTMLCleaner遇到处理命名空间的问题。这是一个XML命名空间声明,会给它麻烦的例子:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de" 
    xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml" 
    itemscope itemtype="http://schema.org/CreativeWork"> 

,你可以看到itemscope属性被破坏,使得HtmlCleaner抛出NAME_SPACE_ERR。为了避免这个问题

一种方法是添加行

props.setNamespacesAware(false); 

果然命名空间处理掉。