-1
我想创建一个线程以抓取网站的所有链接并将其存储在LinkedHashSet
中,但是当我打印此LinkedHashSet
的大小时,它不打印任何内容。我已经开始学习爬行了!我引用了Java的艺术。这里是我的代码:抓取网页和存储链接
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.LinkedHashSet;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TestThread {
public void crawl(URL url) {
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(url.openConnection().getInputStream()));
String line = reader.readLine();
LinkedHashSet toCrawlList = new LinkedHashSet();
while (line != null) {
toCrawlList.add(line);
System.out.println(toCrawlList.size());
}
} catch (IOException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
final TestThread test1 = new TestThread();
Thread thread = new Thread(new Runnable() {
public void run(){
try {
test1.crawl(new URL("http://stackoverflow.com/"));
} catch (MalformedURLException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
});
}
}
问题是什么? – Marcin 2014-10-20 07:15:24
我不知道如何获得我已经被抓取和存储的所有链接,我只是使用LinkHashSet来存储,但是当我抓取并打印出来时,它什么也没有显示 – TrangVu 2014-10-21 10:46:01