从输出中删除HTML标签

我是新来的python，并且无法从输出中删除html标签。我想删除标签及其中的内容。我想也删除p标签。有什么建议么？从输出中删除HTML标签

import urllib2 
from bs4 import BeautifulSoup 

# Ask user to enter URL 
url = raw_input("Please enter a valid URL: ") 

# Make sure file is clear for new content 
open('ctp_output.txt', 'w').close() 

# Open txt document for output 
txt = open('ctp_output.txt', 'w') 

# Parse HTML of article, aka making soup 
soup = BeautifulSoup(urllib2.urlopen(url).read()) 

# retrieve all of the paragraph tags 
tags = soup('p') 
txt.write(str(tag) + '\n' + '\n') 

# Close txt file with new content added 
txt.close()

来源

2014-02-25 user3285763

这可能是useful.http：//stackoverflow.com/questions/753052/strip -html-from-strings-in-python – Manjunath

通过使用get_text()函数代替的字符串表示（str(tag)）检索文本部分从标签。

在变化上面的代码将替换该行：

txt.write(str(tag) + '\n' + '\n')

有：

txt.write(tag.get_text() + '\n' + '\n')

来源

2014-02-25 23:36:45 HAL

我不得不将它放到for循环中来克服ResultSet实例问题。但它工作得很好。谢谢您的帮助！如果可以的话，我会让你满意的。 – user3285763

很好，你得到它的工作！如果您对答案感到满意，请将答案标记为已接受（不需要upvote）。 – HAL

从输出中删除HTML标签

回答

相关问题