2012-04-21 45 views
2

我只是第一次和Solr一起玩耍。在Ubuntu服务器上安装并运行它,发布了exampledocs目录内的示例xml文档,并能够搜索“monitor”,“apple”和“Dell”等关键字,因为这些文档位于示例文件中。如何在Apache Solr中创建自己的字段并上传文档?

现在我想添加自己的文档与自定义字段。这就是在那里被默认在scheme.xml:

<fields> 
    <!-- Valid attributes for fields: 
    name: mandatory - the name for the field 
    type: mandatory - the name of a previously defined type from the 
     <types> section 
    indexed: true if this field should be indexed (searchable or sortable) 
    stored: true if this field should be retrievable 
    multiValued: true if this field may contain multiple values per document 
    omitNorms: (expert) set to true to omit the norms associated with 
     this field (this disables length normalization and index-time 
     boosting for the field, and saves some memory). Only full-text 
     fields or fields that need an index-time boost need norms. 
     Norms are omitted for primitive (non-analyzed) types by default. 
    termVectors: [false] set to true to store the term vector for a 
     given field. 
     When using MoreLikeThis, fields used for similarity should be 
     stored for best performance. 
    termPositions: Store position information with the term vector. 
     This will increase storage costs. 
    termOffsets: Store offset information with the term vector. This 
     will increase storage costs. 
    default: a value that should be used if no value is specified 
     when adding a document. 
    --> 

    <field name="id" type="string" indexed="true" stored="true" required="true" /> 
    <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/> 
    <field name="name" type="text_general" indexed="true" stored="true"/> 
    <field name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/> 
    <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/> 
    <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/> 
    <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/> 
    <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" /> 

    <field name="weight" type="float" indexed="true" stored="true"/> 
    <field name="price" type="float" indexed="true" stored="true"/> 
    <field name="popularity" type="int" indexed="true" stored="true" /> 
    <field name="inStock" type="boolean" indexed="true" stored="true" /> 

    <!-- 
    The following store examples are used to demonstrate the various ways one might _CHOOSE_ to 
    implement spatial. It is highly unlikely that you would ever have ALL of these fields defined. 
    --> 
    <field name="store" type="location" indexed="true" stored="true"/> 

    <!-- Common metadata fields, named specifically to match up with 
    SolrCell metadata when parsing rich documents such as Word, PDF. 
    Some fields are multiValued only because Tika currently may return 
    multiple values for them. 
    --> 
    <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/> 
    <field name="subject" type="text_general" indexed="true" stored="true"/> 
    <field name="description" type="text_general" indexed="true" stored="true"/> 
    <field name="comments" type="text_general" indexed="true" stored="true"/> 
    <field name="author" type="text_general" indexed="true" stored="true"/> 
    <field name="keywords" type="text_general" indexed="true" stored="true"/> 
    <field name="category" type="text_general" indexed="true" stored="true"/> 
    <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/> 
    <field name="last_modified" type="date" indexed="true" stored="true"/> 
    <field name="links" type="string" indexed="true" stored="true" multiValued="true"/> 

    <!-- catchall field, containing all other searchable text fields (implemented 
     via copyField further on in this schema --> 
    <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/> 

    <!-- catchall text field that indexes tokens both normally and in reverse for efficient 
     leading wildcard queries. --> 
    <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/> 

    <!-- non-tokenized version of manufacturer to make it easier to sort or group 
     results by manufacturer. copied from "manu" via copyField --> 
    <field name="manu_exact" type="string" indexed="true" stored="false"/> 

    <field name="payloads" type="payloads" indexed="true" stored="true"/> 

    <!-- Uncommenting the following will create a "timestamp" field using 
     a default value of "NOW" to indicate when each document was indexed. 
    --> 
    <!-- 
    <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/> 
    --> 

    <!-- Dynamic field definitions. If a field name is not found, dynamicFields 
     will be used if the name matches any of the patterns. 
     RESTRICTION: the glob-like pattern in the name attribute must have 
     a "*" only at the start or the end. 
     EXAMPLE: name="*_i" will match any field ending in _i (like myid_i, z_i) 
     Longer patterns will be matched first. if equal size patterns 
     both match, the first appearing in the schema will be used. --> 
    <dynamicField name="*_i" type="int" indexed="true" stored="true"/> 
    <dynamicField name="*_s" type="string" indexed="true" stored="true"/> 
    <dynamicField name="*_l" type="long" indexed="true" stored="true"/> 
    <dynamicField name="*_t" type="text_general" indexed="true" stored="true"/> 
    <dynamicField name="*_txt" type="text_general" indexed="true" stored="true" multiValued="true"/> 
    <dynamicField name="*_en" type="text_en" indexed="true" stored="true" multiValued="true" /> 
    <dynamicField name="*_b" type="boolean" indexed="true" stored="true"/> 
    <dynamicField name="*_f" type="float" indexed="true" stored="true"/> 
    <dynamicField name="*_d" type="double" indexed="true" stored="true"/> 

    <!-- Type used to index the lat and lon components for the "location" FieldType --> 
    <dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/> 

    <dynamicField name="*_dt" type="date" indexed="true" stored="true"/> 
    <dynamicField name="*_p" type="location" indexed="true" stored="true"/> 

    <!-- some trie-coded dynamic fields for faster range queries --> 
    <dynamicField name="*_ti" type="tint" indexed="true" stored="true"/> 
    <dynamicField name="*_tl" type="tlong" indexed="true" stored="true"/> 
    <dynamicField name="*_tf" type="tfloat" indexed="true" stored="true"/> 
    <dynamicField name="*_td" type="tdouble" indexed="true" stored="true"/> 
    <dynamicField name="*_tdt" type="tdate" indexed="true" stored="true"/> 

    <dynamicField name="*_pi" type="pint" indexed="true" stored="true"/> 
    <dynamicField name="*_c" type="currency" indexed="true" stored="true"/> 

    <dynamicField name="ignored_*" type="ignored" multiValued="true"/> 
    <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true"/> 

    <dynamicField name="random_*" type="random" /> 

    <!-- uncomment the following to ignore any fields that don't already match an existing 
     field name or dynamic field, rather than reporting them as an error. 
     alternately, change the type="ignored" to some other type e.g. "text" if you want 
     unknown fields indexed and/or stored by default --> 
    <!--dynamicField name="*" type="ignored" multiValued="true" /--> 

</fields> 

和默认示例文件看起来像:

<add><doc> 
    <field name="id">3007WFP</field> 
    <field name="name">Dell Widescreen UltraSharp 3007WFP</field> 
    <field name="manu">Dell, Inc.</field> 
    <field name="cat">electronics</field> 
    <field name="cat">monitor</field> 
    <field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast</field> 
    <field name="includes">USB cable</field> 
    <field name="weight">401.6</field> 
    <field name="price">2199</field> 
    <field name="popularity">6</field> 
    <field name="inStock">true</field> 
    <!-- Buffalo store --> 
    <field name="store">43.17614,-90.57341</field> 
</doc></add> 

我取代了我自己的自定义的schema.xml中文件中的字段:

<fields> 
    <field name="user_id" type="string" indexed="true" stored="true" /> 
    <field name="about" type="string" indexed="true" stored="true" /> 
    <field name="music" type="string" indexed="true" stored="true" /> 
    <field name="movies" type="string" indexed="true" stored="true" /> 
    <field name="occupation" type="string" indexed="true" stored="true" /> 
</fields> 

,并试图发布此文档命名mydoc.xml:

<add> 
    <doc> 
     <field name="user_id">foobar</field> 
     <field name="about">I am a somebody</field> 
     <field name="music">pop, rock</field> 
     <field name="movies">titanic</field> 
     <field name="occupation">web developer</field> 
    </doc> 
</add> 

,当我试图使用相同的旧的命令发布:

java -jar post.jar mydoc.xml 

这是我收到的错误:

SimplePostTool: version 1.4 
SimplePostTool: POSTing files to http://localhost:8983/solr/update.. 
SimplePostTool: POSTing file mydoc.xml 
SimplePostTool: FATAL: Solr returned an error #400 ERROR: [doc=null] unknown field 'user_id' 

我也注意到,如果我重新启动Solr的服务是无法加载Solr管理,给人的消息:

HTTP ERROR 500 

Problem accessing /solr/admin/. Reason: 

    Severe errors in solr configuration. 

Check your log files for more detailed information on what may be wrong. 

If you want solr to continue after configuration errors, change: 

<abortOnConfigurationError>false</abortOnConfigurationError> 

in solr.xml 

然后是一大堆其他Java类型的错误...

如果我从schema.xml中删除自己的自定义字段并重新启动Solr,它将加载Solr管理员就好了。

因此,我在这里不知所措,我如何添加我自己的自定义字段并能够将我的文档发布到Solr?

+0

您在schema.xml中没有名为user_id的字段。 – bmargulies 2012-04-21 21:05:51

+0

@bmargulies是的,我看,我发布了我用什么替换了schema.xml的默认字段的部分。 – TK123 2012-04-21 21:15:13

+0

不在你的运行实例中,你没有,或者你不会得到那个错误。 – bmargulies 2012-04-21 22:17:16

回答

2

的问题是,我忘了更新的:

<uniqueKey>id</uniqueKey> 

是:

<uniqueKey>user_id</uniqueKey> 

在schema.xml中的底部。另一个问题是,当我在Solr管理中使用*:*进行搜索时,一切都很好,但是当我通过字符串(关键字)搜索时,它给出了undefined field text错误。为了解决这个问题,我不得不将这添加为我的字段之一:

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/> 
相关问题