2016-01-24 57 views
1

解析结果在下面的链接 https://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape1.clj如何从HttpClient的在enlive

它显示了如何从URL解析的页面,但我需要使用SOCK5代理,我想不出如何使用代理里面enlive,但我知道如何在HttpClient的使用代理,但如何解析从HttpClient的结果,我有以下的代码,但最后一行显示空结果

(:require [clojure.set :as set] 
       [clj-http.client :as client] 
       [clj-http.conn-mgr :as conn-mgr] 
       [clj-time.core :as time] 
       [jsoup.soup :as soup] 
       [clj-time.coerce :as tc] 
       [net.cgrand.enlive-html :as html] 
       )  
    (def a (client/get "https://news.ycombinator.com/" 
          {:connection-manager (conn-mgr/make-socks-proxied-conn-manager "127.0.0.1" 9150) 
           :socket-timeout 10000 :conn-timeout 10000 
           :client-params {"http.useragent" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.672.2 Safari/534.20"}})) 
(def b (html/html-resource a)) 
(html/select b [:td.title :a]) 

回答

1

当使用enlive的html-resource FN从URL执行提取,然后将其转换为可解析的数据结构。看起来,当你传递一个已经完成的请求时,它只是返回请求而不是抛出错误。

无论哪种方式,你想要的功能是html-snippet,你会想要通过它的请求正文。像这样:

;; Does not matter if you are using a connection manager or not as long as 
;; its returning a result with a body 
(def req (client/get "https://news.ycombinator.com/")) 

(def body (:body req)) 
(def nodes (html/html-snippet body)) 
(html/select nodes [:td.title :a]) 

;; Or you can put it all together like this 

(-> req 
    :body 
    html/html-snippet 
    (html/select [:td.title :a])))