2012-04-03 38 views
0

我试图获得如看到here运行的crawler4j的基本形式。我已经修改通过定义rootFolder和numberOfCrawlers前几行如下:crawler4j的实现

public class BasicCrawlController { 

    public static void main(String[] args) throws Exception { 
      if (args.length != 2) { 
        System.out.println("Needed parameters: "); 
        System.out.println("\t rootFolder (it will contain intermediate crawl data)"); 
        System.out.println("\t numberOfCralwers (number of concurrent threads)"); 
        return; 
      } 

      /* 
      * crawlStorageFolder is a folder where intermediate crawl data is 
      * stored. 
      */ 
      String crawlStorageFolder = args[0]; 

       args[0] = "/data/crawl/root"; 

      /* 
      * numberOfCrawlers shows the number of concurrent threads that should 
      * be initiated for crawling. 
      */ 
      int numberOfCrawlers = Integer.parseInt(args[1]); 

      args[1] = "7"; 


      CrawlConfig config = new CrawlConfig(); 

      config.setCrawlStorageFolder(crawlStorageFolder); 

无论我怎样似乎它定义我还在收到错误

Needed parameters: 
rootFolder (it will contain intermediate crawl data) 
numberOfCralwers (number of concurrent threads) 

我认为我需要“在运行配置”窗口中设置参数,但我不知道这意味着什么。我该如何正确配置这个基本的爬虫来启动它并运行?

回答

2

后,您与您需要通过键入运行它使用javac编译关键字的程序如下:

的Java BasicCrawler控制器“ARG1”“ARG2”

错误是告诉你,你是不是在运行程序时指定arg [0]或arg [1]。此外,这是什么“参数[1] =”7“;”你已经收到了一些爬虫参数?

对于看起来您正在尝试删除前5行,因为您尝试使用硬编码的值。然后将crawlForStorage String设置为您的目录路径,将numberOfCrawlers设置为7.然后,您不必指定命令行参数。如果你想使用命令行参数摆脱你上面的硬编码值,并在CL

+0

指定它们的工作,我不得不硬编码目录,并摆脱异常赶上一起。谢谢! – KDEx 2012-04-03 21:55:46