1
我正在尝试使用Jsoup来阅读论坛页面,但我无法这样做。我已成功登录,比我能够阅读第一页或列表页面。但是,当我去线程页,它给我403下面的代码:成功登录JSOUP后无法读取线程页面
Connection.Response loginForm = Jsoup.connect("http://picturepub.net/index.php?login/login").method(Connection.Method.GET)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0").timeout(0).execute();
Document doc = Jsoup.connect("http://picturepub.net/index.php?login/login").data("cookieexists", "false").data("cookie_check", "1").data("login", "swordblazer")
.data("password", "picturepub").data("register", "0").data("redirect", "/index.php").cookies(loginForm.cookies())
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0").post();
doc = loginForm.parse();
Map<String, String> cookies = loginForm.cookies();
List<String> urls = new ArrayList<String>();
List<String> threadUrls = new ArrayList<String>();
int h = 0;
for (int i = 1; i < 20; i++) {
if (i == 1)
doc = Jsoup.connect("http://picturepub.net/index.php?forums/photoshoots-magazines.51/")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0").cookies(cookies).get();
else
doc = Jsoup.connect("http://picturepub.net/index.php?forums/photoshoots-magazines.51/page-" + i)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0").cookies(cookies).get();
// get all links
Elements links = doc.select("a[href]");
System.out.println(doc.title());
for (Element element : links) {
if (element.absUrl("href").contains("threads")) {
String linkImage = element.absUrl("href");
Document document = Jsoup.connect(linkImage).cookies(cookies).get();
if (!threadUrls.contains(linkImage)) {
threadUrls.add(linkImage);
h++;
}
}
}
}
你得到'403'可能是因为你缺少一些参数/ cookie。如果您已经想出如何登录,而不是使用相同的方法来监控浏览器与网站之间的流量并查看您的浏览器正在发送的内容。 – TDG
我做到了。除了需要发送给服务器的cookie以外,还有其他什么吗? – user236928
Cookie和所需的参数。 – TDG