2016-09-28 190 views
0

我有一个html表格,我想从中删除具有某个类的行。 不过:当我尝试sed 's/<tr class="expandable">.*<\/tr>//g只是什么也不做(比如:不删除标记)从html输入中删除标签sed

一个例子输入可以是:

<tr><td>Some col</td></tr> 
<tr class="expandable"> 
    <td colspan="6"> 
     <div class="expandable-content"> 
<p>Holds ACCA Practising Certificate: This indicates a member holding a practising certificate issued by ACCA. This means that the member is authorised to provide a range of general accountancy services to individuals and businesses, including business and tax advice and planning, preparation of personal and business tax returns, set up of book-keeping and business systems, providing book-keeping services, payroll work, assistance with management accounting help with raising finance, budgeting and cash-flow advice, business start-up advice and expert witness.</p> 
     </div> 
    </td> 
</tr> 

我不是sed亲和欣赏任何帮助,您可以给我!

+3

强制性[不解析与正则表达式HTML(http://stackoverflow.com/ a/1732454/7552)链接。 –

+0

“您是否尝试过使用XML解析器?” - > xmllint和xidel这两个都不能删除某一行“类型” - 至少我不知道一种方式 – Fuzzyma

+0

我认为有示例输入显示的错字,最后一行可能是''......这可能会工作'perl -0777 -pe's | 。*? || gs'file'但不像已经指出的那样健壮 – Sundeep

回答

2

假设你的HTML是有效的XML,你可以像使用工具:

xmlstarlet ed -d '//tr[@class="expandable"]' <<ENDHTML 
<html><body><table> 
    <tr><td>Some col</td></tr> 
    <tr class="expandable"> 
     <td colspan="6"> 
      <div class="expandable-content"> 
    <p>Holds ACCA Practising Certificate: This indicates a member holding a practising certificate issued by ACCA. This means that the member is authorised to provide a range of general accountancy services to individuals and businesses, including business and tax advice and planning, preparation of personal and business tax returns, set up of book-keeping and business systems, providing book-keeping services, payroll work, assistance with management accounting help with raising finance, budgeting and cash-flow advice, business start-up advice and expert witness.</p> 
      </div> 
     </td> 
    </tr> 
</table></body></html> 
ENDHTML 
<?xml version="1.0"?> 
<html> 
    <body> 
    <table> 
     <tr> 
     <td>Some col</td> 
     </tr> 
    </table> 
    </body> 
</html>