成功连接的nutch 1.12使用Solr 6.5和抓取的未验证的网站。在尝试抓取经过身份验证的网站时,我无法继续处理它。任何人都可以请帮助克服它。solr的6.5的nutch 1.12
错误:
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form exists: user-login
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:485)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:180)
at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:261)
at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:295)
Caused by: java.lang.IllegalArgumentException: No form exists: user-login
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:183)
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:483)
的HttpClient-auth.xml:
<auth-configuration>
<credentials authMethod="formAuth"
loginUrl="<url>"
loginFormId="user-login"
loginRedirect="true">
<loginPostData>
<field name="name"
value="*<name>*"/>
<field name="pass"
value="*<password>*"/>
<field name="op"
value="Log in"/>
</loginPostData>
</credentials>
</auth-configuration>
搜索几个环节,但不能得到解决。
谢谢。
在'$ NUTCH_HOME/conf目录/ Nutch的-site.xml'加上' plugin.includes 协议的HttpClient | urlfilter正则表达式| parse-(HTML |蒂卡)|指数 - (基本|锚) |索引弹性|记分OPIC | urlnormalizer-(通|正则表达式|基本) 正则表达式命名的插件目录名 包括。 说明> '忽略,如果已经有了,和回复 –
看到你的错误日志的详细信息! –