1
作为一个概念证明,我构建了这个非常简单的Twitter朋友爬行器。下面是它会做:GridGain/Scala - 在现有作业中生成作业
- 执行CrawlJob的Twitter帐户 “推特用户-1”
- 查找 “Twitter的用户-1”
- 执行CrawlJob为所有朋友的所有朋友“twitter-用户1"
这里是我的代码看起来像至今:
def main(args:Array[String]) {
scalar {
grid.execute(classOf[CrawlTask], "twitter-user-1").get
}
}
class CrawlTask extends GridTaskNoReduceSplitAdapter[String] {
def split(gridSize:Int, arg:String): Collection[GridJob] = {
val jobs:Collection[GridJob] = new ArrayList[GridJob]()
val initialCrawlJob = new CrawlJob()
initialCrawlJob.twitterId = arg
jobs.add(initialCrawlJob)
jobs
}
}
class CrawlJob extends GridJob {
var twitterId:String = new String()
def cancel() = {
println("cancel - " + twitterId)
}
def execute():Object = {
println("fetch friends for - " + twitterId)
// Fetch and execute CrawlJobs for all friends
return null
}
}
我对所有的Twitter INTE编写Java服务raction。需要一些示例来了解如何在现有作业中创建新作业并将其与原始任务相关联。
谢谢| Srirangan
详细的说明... http://srirangan.net/2011-03-build-a-simple-web-crawler-with-scala-和gridgain – Sri 2011-03-10 18:33:43