2016-06-10 74 views
0

我想使用javascript解析整个html页面,并使用regEx计算出现在其中的不同标记的总数,然后将其打印出来。任何人都可以请帮助我如何去做呢?代码将是最欢迎使用正则表达式计算HTML页面标记

例如,如果是这样的html页面:

<html> <head> </head> <body> <a>This is a tagt 2</a> <p>This is 
paragraph1</p> <a>This is Assigntment 2</a> <p>This is paragraph1 
</p> <div> <img> </img> </div> <body> </html> 

那么预期输出是:

  • a标签= 2
  • p标签的数= 2
  • etc
+0

你只想要得到每种类型的标记的数量? – Mairaj

+0

也许这会帮助开始:http:// stackoverflow。COM /问题/ 10585029 /解析-A-HTML的字符串与-JS – allu

+0

是:)例如像,如果这是在HTML页面 \t \t 这是一个TAGT 2 \t

这是PARAGRAPH1

\t 这是Assigntment 2 \t

这是PARAGRAPH1

\t
\t \t \t \t \t
\t \t 那么预期输出是: 号标签:2 p标签数量:2 等 – Harry

回答

2

描述

计算字符串中的所有标签名称,同时避免困难的边缘情况。

正则表达式

<([a-z]+)(?=[\s>])(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?\/?> 

Regular expression visualization

现场演示

示例代码

var string = "<html> <head> </head> <body> <a>This is a tagt 2</a> <p>This is paragraph1</p> <a>This is Assigntment 2</a> <p>This is paragraph1</p> <div> <img> </img> </div> <body> </html>"; 

console.log(string); 
var re = /<([a-z]+)(?=[\s>])(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?\/?>/gi; 
var m; 
var HashTable = {}; 

do { 
    // conduct the match 
    m = re.exec(string); 

    // verify the match was successful 
    if (m) { 
     // verify the HashTable has an entry for the found tag name 
     if (!(m[1] in HashTable)) { 
      // no entry was found so we'll add the entry for this tag name and count it as zero 
      HashTable[m[1]] = 0 
     } // end if 

     // increment the tag name counter 
     HashTable[m[1]] ++ 
    } // end if 
} while (m); 

console.log("") 
// output the number of all found tag names 
for (var key in HashTable) { 
    console.log(key + "=" + HashTable[key]); 
} 

样本输出

<html> <head> </head> <body> <a>This is a tagt 2</a> <p>This is paragraph1</p> <a>This is Assigntment 2</a> <p>This is paragraph1</p> <div> <img> </img> </div> <body> </html> 

html=1 
head=1 
body=2 
a=2 
p=2 
div=1 
img=1