2013-02-08 139 views
9

我试图用elasticsearch/NEST索引pdf文档。ElasticSearch&附件类型(NEST C#)

该文件已建立索引,但搜索结果返回0次匹配。

我需要的搜索结果只返回文档ID和高亮结果

(不以base64内容)

下面是代码:

在这里我要感谢所有帮助,

感谢,

class Program 
{ 
    static void Main(string[] args) 
    { 
     // create es client 
     string index = "myindex"; 

     var settings = new ConnectionSettings("localhost", 9200) 
      .SetDefaultIndex(index); 
     var es = new ElasticClient(settings); 

     // delete index if any 
     es.DeleteIndex(index); 

     // index document 
     string path = "test.pdf"; 
     var doc = new Document() 
     { 
      Id = 1, 
      Title = "test", 
      Content = Convert.ToBase64String(File.ReadAllBytes(path)) 
     }; 

     var parameters = new IndexParameters() { Refresh = true }; 
     if (es.Index<Document>(doc, parameters).OK) 
     { 
      // search in document 
      string query = "semantic"; // test.pdf contains the string "semantic" 

      var result = es.Search<Document>(s => s 
       .Query(q => 
        q.QueryString(qs => qs 
         .Query(query) 
        ) 
       ) 
       .Highlight(h => h 
        .PreTags("<b>") 
        .PostTags("</b>") 
        .OnFields(
         f => f 
         .OnField(e => e.Content) 
         .PreTags("<em>") 
         .PostTags("</em>") 
        ) 
       ) 
      ); 

      if (result.Hits.Total == 0) 
      { 
      } 
     } 
    } 
} 

[ElasticType(
    Name = "document", 
    SearchAnalyzer = "standard", 
    IndexAnalyzer = "standard" 
)] 
public class Document 
{ 
    public int Id { get; set; } 

    [ElasticProperty(Store = true)] 
    public string Title { get; set; } 

    [ElasticProperty(Type = FieldType.attachment, 
     TermVector = TermVectorOption.with_positions_offsets)] 
    public string Content { get; set; } 
} 
+0

此外,搜索证实,映射器,附件插件安装并加载(使用es.yml:plugin.mandatory:映射器-attachments)。尽管如此,我的pdf中没有包含任何词语。我已经搜索了这个问题的答案(stackoverflow和其他人),只有卷曲的例子,没有使用C#/ NEST的使用示例。 (只是一个注释:当搜索document.title('test.pdf')时,我确实收到了文档,但是在搜索'test'时没有命中。 – 2013-02-09 20:54:23

+0

只是为了让你知道我打算为这个明天创建集成测试并回答这个问题。我无法早日回答。 – 2013-02-13 12:19:17

+1

对此问题的任何更新? – slimflem 2013-09-07 19:40:12

回答

1

//我现在用FSRiver插件 - https://github.com/dadoonet/fsriver/

void Main() 
{ 
    // search in document 
    string query = "directly"; // test.pdf contains the string "directly" 
    var es = new ElasticClient(new ConnectionSettings(new Uri("http://*.*.*.*:9200")) 
     .SetDefaultIndex("mydocs") 
     .MapDefaultTypeNames(s=>s.Add(typeof(Doc), "doc"))); 
     var result = es.Search<Doc>(s => s 
     .Fields(f => f.Title, f => f.Name) 
     .From(0) 
     .Size(10000) 
      .Query(q => q.QueryString(qs => qs.Query(query))) 
      .Highlight(h => h 
       .PreTags("<b>") 
       .PostTags("</b>") 
       .OnFields(
        f => f 
        .OnField(e => e.File) 
        .PreTags("<em>") 
        .PostTags("</em>") 
       ) 
      ) 
     ); 
} 

[ElasticType(Name = "doc", SearchAnalyzer = "standard", IndexAnalyzer = "standard")] 
public class Doc 
{ 
    public int Id { get; set; } 

    [ElasticProperty(Store = true)] 
    public string Title { get; set; } 

    [ElasticProperty(Type = FieldType.attachment, TermVector = TermVectorOption.with_positions_offsets)] 
    public string File { get; set; } 
    public string Name { get; set; } 
} 
0

我在相同的工作,所以我现在想这个 http://www.elasticsearch.cn/tutorials/2011/07/18/attachment-type-in-action.html

本文解释问题

工资注意力放在你应该做正确的映射

"title" : { "store" : "yes" }, 
"file" : { "term_vector":"with_positions_offsets", "store":"yes" } 

我会尝试弄清楚如何用NEST API来做到这一点,并更新这篇文章

+0

有关使其工作的任何更新? – bayCoder 2014-05-30 19:04:23

-1

在索引项目之前,您需要添加如下所示的映射。

client.CreateIndex("yourindex", c => c.NumberOfReplicas(0).NumberOfShards(12).AddMapping<AssetSearchEntryModels>(m => m.MapFromAttributes())); 
8

安装附件插件并重新启动ES

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/2.3.2 

创建一个附件类映射到附件插件文档

public class Attachment 
    { 
     [ElasticProperty(Name = "_content")] 
     public string Content { get; set; } 

     [ElasticProperty(Name = "_content_type")] 
     public string ContentType { get; set; } 

     [ElasticProperty(Name = "_name")] 
     public string Name { get; set; } 
    } 

添加的属性上,你与索引的文档类名称“文件”并正确映射

[ElasticProperty(Type = FieldType.Attachment, TermVector = TermVectorOption.WithPositionsOffsets, Store = true)] 
    public Attachment File { get; set; } 

在您为班级的任何实例编制索引之前,显式创建您的索引。如果你不这样做,它将使用动态映射并忽略你的属性映射。如果将来更改映射,请始终重新创建索引。

client.CreateIndex("index-name", c => c 
    .AddMapping<Document>(m => m.MapFromAttributes()) 
); 

指数的项目

string path = "test.pdf"; 

    var attachment = new Attachment(); 
    attachment.Content = Convert.ToBase64String(File.ReadAllBytes(path)); 
    attachment.ContentType = "application/pdf"; 
    attachment.Name = "test.pdf"; 

    var doc = new Document() 
    { 
     Id = 1, 
     Title = "test", 
     File = attachment 
    }; 
    client.Index<Document>(item); 

上的文件属性

var query = Query<Document>.Term("file", "searchTerm"); 

    var searchResults = client.Search<Document>(s => s 
      .From(start) 
      .Size(count) 
      .Query(query) 
); 
+0

伟大的它适合我....谢谢你! – 2017-01-09 13:26:42