2013-02-08 54 views
0

解析HTML时,每当我有'>'char时,我需要检查它后面是否有数字。该号码可以是1位,2位或3位数字。从Java中的HTML字符串解析数字

该代码似乎没问题,但我总是得到一个StringIndexOutOfBoundException

代码:

while (matches < 19) 
    { 
     more = dataInHtml.indexOf(">",index); 
     nextOne = dataInHtml.charAt(more + 1); 
     nextTwo = dataInHtml.charAt(more + 2); 
     nextThree = dataInHtml.charAt(more + 3); 

     if (Character.isDigit(nextOne)) digitOne = true; 
     if (Character.isDigit(nextTwo)) digitTwo = true;  
     if (Character.isDigit(nextThree)) digitThree = true; 

     if (digitThree) 
     { 
      data[matches] = dataInHtml.substring(more + 1, 3); 
      matches++; 
      digitThree = false; 
      digitTwo = false; 
      digitOne = false; 
      index = more + 3; 
      itWasADigit = true; 
     } 

     if (digitTwo) 
     { 
      data[matches] = dataInHtml.substring(more + 1, 2); 
      matches++; 
      digitTwo = false; 
      digitOne = false; 
      index = more + 2; 
      itWasADigit = true; 
     }   

     if (digitOne) 
     { 
      data[matches] = dataInHtml.substring(more + 1, 1); 
      matches++; 
      digitOne = false; 
      index = more + 1; 
      itWasADigit = true; 
     }   

     if (!(itWasADigit))  
     { 
      index = more + 1; 
      itWasADigit = false; 
     } 
    } 
+0

将字符转换为ASCII并比较值 – orangegoat 2013-02-08 15:45:12

+0

哪一行正在执行StringIndexOutOfBoundException? – 2013-02-08 15:46:26

+0

data [matches] = dataInHtml.substring(more + 1,2); – Alpan67 2013-02-08 15:48:31

回答

2

如果传递字符串 “字符串> 12” 这是什么会做:

more = dataInHtml.indexOf(">",index); 
    nextOne = dataInHtml.charAt(more + 1); <-- get the 1 
    nextTwo = dataInHtml.charAt(more + 2); <-- Get the 2 
    nextThree = dataInHtml.charAt(more + 3); <-- Try to access outside of the string as more+3 is greater than the highest index in the string, so it crashes out 

因此,你看到StringIndexOutOfBoundsException

使用这样的

if(dataInHtml.length() > more+3) 

要检查字符串的长度是试图访问一个字符之前不够大。

如果您试图从HTML文档读取数字,这可能不是理想的方法。如果可能的话,你应该考虑用解析器解析它。

http://jsoup.org/看起来很有希望。

+0

> 12 我有一个像这样的HTML文件 – Alpan67 2013-02-08 15:50:52

+1

它会因上一个'>'而中断。看到它后会尝试访问太大的字符串索引 – cowls 2013-02-08 15:52:05

+0

我该如何解决它? – Alpan67 2013-02-08 15:53:52