我想上传一个文件(一些MS文件)例如solr,但我想添加我自己的领域到这个上传,像userId的人谁上传了它或一些标签。文件的内容必须被解析和搜索,exta参数应该被添加为字段。为此我已经加入schema.xml中定义如下上传一个文件到我自己的参数solr添加
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.1">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<!-- A general text field that has reasonable, generic
cross-language defaults: it tokenizes with StandardTokenizer,
removes stop words from case-insensitive "stopwords.txt"
(empty by default), and down cases. At query time only, it
also applies synonyms. -->
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="documentId" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="text" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="metadata_*" type="text_general" indexed="true" stored="true" multiValued="true"/>
</fields>
<uniqueKey>documentId</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
</schema>
我solrconfig.xml中的相关部分,现在看起来是这样的:
<equestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="fmap.documentId">documentId</str>
<!-- also tried with
<str name="fmap.literal.documentId">documentId</str>
and
<str name="literal.documentId">documentId</str>
-->
<str name="uprefix">metadata_</str>
<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>
但是不管我用这个命令什么样的组合:
java -Durl=http://localhost:9090/solr/update/extract?documentId=test -jar post.jar somedoc.pdf
或
java -Durl=http://localhost:9090/solr/update/extract?literal.documentId=test -jar post.jar somedoc.pdf
我不断获取缺少必要的字段documentId
问候 罗纳德
thx为您的言论,我更新了一些更详细的问题 – Ronald 2012-08-06 14:14:18
“java -Durl = http:// localhost:9090/solr/update /提取-Dparams = literal.documentId = test“你试过卷曲吗? – Fuxi 2012-08-06 14:32:53
curl“http:// localhost:9090/solr/update/extract?literal.documentId = test&commit-true”-F“ [email protected]”同样的错误。使用-Dparams ='literal.documentId =测试'相同的错误。叹气,它一定是我错误配置的一些愚蠢的东西,但是什么,在哪里? – Ronald 2012-08-06 15:09:28