2016-08-17 69 views
0

我正在用Android写一个webcrawler。我的代码是我可以使用AsyncHttpResponseHandler或AsyncHttpClient类查找HTML标记吗?

public void parseHttp() { 
     AsyncHttpClient client = new AsyncHttpClient(); 
     String url = "http://stackoverflow.com/questions/38959381/unable-to-scrape-data-from-internet-using-android-intents"; 

     client.get(url, new AsyncHttpResponseHandler(Looper.getMainLooper()) { 
      @Override 
      public void onSuccess(int statusCode, Header[] headers, byte[] responseBody) { 
       String body = new String(responseBody); 
       System.out.println(body); 

       Pattern p = Pattern.compile("<h1(.*)<\\/h1>"); 
       Matcher m = p.matcher(body); 
       Log.d("tag", "success"); 
       if (m.find()) { 
        String match = m.group(1); 
        Log.d("tag", match); 
       } 

      } 

      @Override 
      public void onFailure(int statusCode, Header[] headers, byte[] responseBody, Throwable error) { 

       Log.d("tag", "failure"); 
      } 
     }); 
    } 

它是找到在一个字符串h1标签是使用regex网页文件的响应。我能找到tag作为一般使用Jsoup库作为

try { 
    Document doc; 
    URL = requestString; 
    doc = Jsoup.connect(URL).timeout(20 * 1000).userAgent("Chrome").get(); 
    Elements links = doc.select("h1"); 
    responseMessage = links.text(); 
} catch (IOException e) { 
    responseMessage = e.getMessage(); 
} 

我能找到使用AsynsHTTPResponceHandler类代码,如Jsoup吗?由于第四行是Elements links = doc.select("h1"); responseMessage = links.text(); 任何帮助或方向将是欣赏。

回答

0

Jsoup允许从字符串解析文档,而不是直接通过HTTP(S)加载它。

Document doc = Jsoup.parseBodyFragment(body); 
+0

谢谢亲爱的。有用。 – waqas

相关问题