0
我试图获得如看到here运行的crawler4j的基本形式。我已经修改通过定义rootFolder和numberOfCrawlers前几行如下:crawler4j的实现
public class BasicCrawlController {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("Needed parameters: ");
System.out.println("\t rootFolder (it will contain intermediate crawl data)");
System.out.println("\t numberOfCralwers (number of concurrent threads)");
return;
}
/*
* crawlStorageFolder is a folder where intermediate crawl data is
* stored.
*/
String crawlStorageFolder = args[0];
args[0] = "/data/crawl/root";
/*
* numberOfCrawlers shows the number of concurrent threads that should
* be initiated for crawling.
*/
int numberOfCrawlers = Integer.parseInt(args[1]);
args[1] = "7";
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
无论我怎样似乎它定义我还在收到错误
Needed parameters:
rootFolder (it will contain intermediate crawl data)
numberOfCralwers (number of concurrent threads)
我认为我需要“在运行配置”窗口中设置参数,但我不知道这意味着什么。我该如何正确配置这个基本的爬虫来启动它并运行?
指定它们的工作,我不得不硬编码目录,并摆脱异常赶上一起。谢谢! – KDEx 2012-04-03 21:55:46