element.text在某些迭代中似乎为无。该错误是说,它不能期待通过无为“-66”,因此检查element.text不无首是这样的:
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)
行其在XML失败是<answer></answer>
那里没有标签之间的文字。
编辑(对您的问题的第二部分关于合并标签):
您可以使用BeautifulSoup
这样的:
from lxml import etree
import BeautifulSoup
planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>"""
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
print soup.prettify()
打印:
<questionaire>
<question>
<questiontext>
What's up?
</questiontext>
<answer>
</answer>
</question>
</questionaire>
这里是一个链接,你可以下载BeautifulSoup module。
或者,这个做了更紧凑的方式:
from lxml import etree
import BeautifulSoup
# abbreviating to reduce answer length...
planhtmlclear_utf=u"<questionaire>.........</questionaire>"
html = etree.fromstring(planhtmlclear_utf)
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()
哇,真的有帮助!非常感谢! – Jurudocs
随时@Jurudocs!乐于帮助。 – chown
也许你可以帮我一个进一步一步:-P现在我得到的输出:<?问卷调查> 这是怎么回事 questiontext> 问卷调查> .....这个答案没有完全显示......为什么? –
Jurudocs