2011-10-08 84 views
1

我试图删除一切之间,如果是数之间66:错误有条件etree LXML

我得到以下错误:类型错误:类型的参数“NoneType”不是可迭代的...如果element.tag == element.text中的'answer'和'-66':

这是什么问题?任何帮助?

#!/usr/local/bin/python2.7 
# -*- coding: UTF-8 -*- 

from lxml import etree 

planhtmlclear_utf=u""" 
<questionaire> 
<question> 
<questiontext>What's up?</questiontext> 
<answer></answer> 
</question> 
<question> 
<questiontext>Cool?</questiontext> 
<answer>-66</answer> 
</question> 
</questionaire> 

""" 

html = etree.fromstring(planhtmlclear_utf) 
questions = html.xpath('/questionaire/question') 
for question in questions: 
    for element in question.getchildren(): 
     if element.tag == 'answer' and '-66' in element.text: 
      html.xpath('/questionaire')[0].remove(question) 
print etree.tostring(html) 

回答

1

element.text在某些迭代中似乎为无。该错误是说,它不能期待通过无为“-66”,因此检查element.text不无首是这样的:

html = etree.fromstring(planhtmlclear_utf) 
questions = html.xpath('/questionaire/question') 
for question in questions: 
    for element in question.getchildren(): 
        if element.tag == 'answer' and element.text and '-66' in element.text: 
            html.xpath('/questionaire')[0].remove(question) 
print etree.tostring(html) 

行其在XML失败是<answer></answer>那里没有标签之间的文字。


编辑对您的问题的第二部分关于合并标签)

您可以使用BeautifulSoup这样的:

from lxml import etree 
import BeautifulSoup 

planhtmlclear_utf=u""" 
<questionaire> 
<question> 
<questiontext>What's up?</questiontext> 
<answer></answer> 
</question> 
<question> 
<questiontext>Cool?</questiontext> 
<answer>-66</answer> 
</question> 
</questionaire>""" 

html = etree.fromstring(planhtmlclear_utf) 
questions = html.xpath('/questionaire/question') 
for question in questions: 
    for element in question.getchildren(): 
        if element.tag == 'answer' and element.text and '-66' in element.text: 
            html.xpath('/questionaire')[0].remove(question) 

soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)) 
print soup.prettify() 

打印:

<questionaire> 
<question> 
    <questiontext> 
    What's up? 
    </questiontext> 
    <answer> 
    </answer> 
</question> 
</questionaire> 

这里是一个链接,你可以下载BeautifulSoup module


或者,这个做了更紧凑的方式:

from lxml import etree 
import BeautifulSoup  

# abbreviating to reduce answer length... 
planhtmlclear_utf=u"<questionaire>.........</questionaire>" 

html = etree.fromstring(planhtmlclear_utf) 
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')] 
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify() 
+0

哇,真的有帮助!非常感谢! – Jurudocs

+0

随时@Jurudocs!乐于帮助。 – chown

+0

也许你可以帮我一个进一步一步:-P现在我得到的输出:<?问卷调查> 这是怎么回事 .....这个答案没有完全显示......为什么? – Jurudocs

1

,以检查是否element.textNone一种替代,可帮助您优化的XPath:

questions = html.xpath('/questionaire/question[answer/text()="-66"]') 
for question in questions: 
    question.getparent().remove(question) 

括号[...]平均“这样的”。所以

question       # find all question elements 
[         # such that 
    answer       # it has an answer subelement 
    /text()      # whose text 
    =        # equals 
    "-66"       # "-66" 
] 
+0

这解决了问题,他没有触及其他答案元素...与上述例子我得到的答案elemts切...但我不知道为什么......无论如何这个解决方案它的作品! – Jurudocs

+0

没有对不起......他正在削减空答案标签......为什么总是这样? – Jurudocs

+0

我不确定我是否理解这个问题。你的意思是''被缩短为''?没关系;它们是等价的。 – unutbu