如何使用jsoup从HTML解析表格

<td width="10"></td> 
<td width="65"><img src="/images/sparks/NIFTY.png" /></td> 
<td width="65">5,390.85</td> 
<td width="65">5,428.15</td> 
<td width="65">5,376.15</td> 
<td width="65">5,413.85</td>

这是我必须从中提取值5390.85,5428.15,5376.15,5413.85的HTML源代码。我想用jsoup来做这个。但我对jsoup比较陌生（今天我开始使用它）。那我应该怎么做呢？如何使用jsoup从HTML解析表格

URL url = new URL("http://www.nseindia.com/content/equities/niftysparks.htm"); 
Document doc = Jsoup.parse(url,3*1000); 
String text = doc.body().text();

我已经使用jsoup提取了网站的内容。但如何提取我需要的值？在此先感谢

来源

2011-03-22 CyprUS

碰到另一个例子http://technology.amis.nl/blog/13121/screenscraping-from-java-using-jsoup-effective-data-gathering-from-websites – 2sb 2012-01-09 18:31:05

尝试这样： -

URL url = new URL("http://www.nseindia.com/content/equities/niftysparks.htm"); 
Document doc = Jsoup.parse(url, 3000); 

Element table = doc.select("table[class=niftyd]").first(); 

Iterator<Element> ite = table.select("td[width=65]").iterator(); 

ite.next(); // first one is image, skip it 

System.out.println("Value 1: " + ite.next().text()); 
System.out.println("Value 2: " + ite.next().text()); 
System.out.println("Value 3: " + ite.next().text()); 
System.out.println("Value 4: " + ite.next().text());

这里的打印输出： -

Value 1: 5,390.85 
Value 2: 5,428.15 
Value 3: 5,376.15 
Value 4: 5,413.85

来源

2011-03-22 19:40:41 limc

谢谢limc。有效。 – CyprUS 2011-03-23 19:00:23

下面是一个使用Groovy郎咸平的例子：

def url = "http://www.espn.co.uk/scrum/rugby/match/scores/recent.html" 
def doc = Jsoup.connec(url).get() 

//Strip the table from the page 
def table = doc.select("table").first() 
// Strip the rows from the table 
def tbRows = table.select("tr") 

// For each column in a row, print its contents if not empty 
tbRows.each { row -> 
    def tbCol = row.select("td") 
    tbCol.each { column -> 
     if(!column.text().empty) { 
      println column.text() 
     } 
    } 
}

可以为您节省这对一个数组进行进一步处理。只是另一个角度。

来源

2015-01-14 12:12:00 Sion

如何使用jsoup从HTML解析表格

回答

相关问题