收集来自Google和Yahoo的搜索结果的程序

-1

我想在Google雅虎上搜索限于特定国家/地区的论坛和博客帖子。结果将被保存到数据库以进行分类和进一步处理。收集来自Google和Yahoo的搜索结果的程序

从每个搜索结果，我需要：

URL本身
日期和时间
域

我上节目，接受关键字输入工作，程序会自动在Google和Yahoo上搜索并将结果保存到数据库中。

function OnLoad() { 
    // Create a search control 
    var searchControl = new google.search.SearchControl(); 

    // Add in a full set of searchers 
    var localSearch = new google.search.LocalSearch(); 
    searchControl.addSearcher(localSearch); 
    searchControl.addSearcher(new google.search.WebSearch()); 
    searchControl.addSearcher(new google.search.VideoSearch()); 
    searchControl.addSearcher(new google.search.BlogSearch()); 
    searchControl.addSearcher(new google.search.NewsSearch()); 
    searchControl.addSearcher(new google.search.ImageSearch()); 
    searchControl.addSearcher(new google.search.BookSearch()); 
    searchControl.addSearcher(new google.search.PatentSearch()); 

    // Set the Local Search center point 
    localSearch.setCenterPoint("New York, NY"); 

    // tell the searcher to draw itself and tell it where to attach 
    searchControl.draw(document.getElementById("searchcontrol")); 

    // execute an inital search 
    searchControl.execute("VW GTI"); 
} 
google.setOnLoadCallback(OnLoad);

这段代码是从谷歌AJAX搜索API，但有似乎不是指定域，国家，日期和时间作为搜索条件的方式。而且，它以HTML格式返回结果，这很难切片并作为搜索结果条目保存到数据库中。

编辑描述我的具体问题。

来源

2011-11-02 Gapton

感谢downvote没有意见 – Gapton

广泛，没有代码，看看什么是和不是一个合适的问题常见问题。 – 2011-11-02 02:31:48

好的我已经编辑它 – Gapton

解析原始HTML应该是您在这里的最后一招。如果他们更改标记，则必须重新设计解析器。这几乎可以保证在您使用Google的AJAX Search API提及的“3年”时间段之前发生。

来源

2011-11-02 02:33:50

我同意解析HTML是一个非常糟糕的解决方案。然而，似乎没有办法以编程方式存储结果，除非依靠可能不可靠的第三方库。 – Gapton

（a）第三方库比HTML抓取更可靠。（b）您提出这个问题的方式，我不确定如果您希望从Google和/或Yahoo取得资讯，您将不会依赖第三方来源。 –

收集来自Google和Yahoo的搜索结果的程序

回答

相关问题