从HTML标签中删除属性

可能重复：
php: how can I remove attributes from an html tag?
How do I iterate over the HTML attributes of a Beautiful Soup element?从HTML标签中删除属性

我有一些HTML类似如下：

<div class="foo"> 
    <p id="first">Hello, world!</p> 
    <p id="second">Stack Overflow</p> 
</div>

，它需要回来如下：

<div> 
    <p>Hello, world!</p> 
    <p>Stack Overflow</p> 
</div>

我更喜欢Python解决方案，因为我已经在需要使用的程序中使用BeautifulSoup。但是，如果这是更好的解决方案，我会向PHP开放。我不认为sed正则表达式就足够了，特别是在将来可能会使用文本中的<符号（我不控制输入）。

来源

2011-08-22 Rory

和[如何-DO-I-迭代 - 过度的HTML的属性 - 对的一美丽的汤元（ http://stackoverflow.com/questions/822571/how-do-i-iterate-over-the-html-attributes-of-a-beautiful-soup-element）和[python-how-to-search-and- correct-html-tags-and-attributes]（http://stackoverflow.com/questions/3360968/python-how-to-search-and-correct-html-tags-and-attributes）和[python-extracting-html -tag-attributes-without-regular-expressions]（http://stackoverflow.com/questions/7141431/python-extracting-html-tag-attributes-without-regular-expressions） – agf

你试过什么了？（请不要尝试使用正则表达式，特别是如果您已经知道如何使用像美丽汤这样的HTML解析器）。 – geoffspear

我试过使用正则表达式，但它很长，并在某处出错。 – Rory

这工作也与SED， <（[A-ZA-Z！] +）[^>] +> 然后仅通过第一组等取代， < \ 1>

来源

2011-08-22 16:47:27 xob

这是通过使用Lxml在Python中很容易实现。

首先安装Lxml，并尝试下面的代码：

from lxml.html import tostring, fromstring 

html = ''' 
<div class="foo"> 
    <p id="first">Hello, world!</p> 
    <p id="second">Stack Overflow</p> 
</div>''' 

htmlElement = fromstring(html) 
for element in htmlElement.cssselect(''): 
    for key in element.keys(): 
     element.attrib.pop(key) 

result = tostring(htmlElement) 

print result

来源

2011-08-22 16:55:40 enderskill

从HTML标签中删除属性

回答

相关问题