Lucene的：有效载荷和相似功能---总是相同的负载值

概述Lucene的：有效载荷和相似功能---总是相同的负载值

我想要实现使用新的有效载荷功能，允许添加元信息文本一个Lucene索引/搜索器。在我的具体情况中，为了使用它们来覆盖标准Lucene TF-IDF权重，我将权重（可以理解为％概率，介于0和100之间）添加到概念标签。我对这种行为感到困惑，我相信相似类有一些问题，我重写了，但我无法弄清楚。

例

当运行的搜索查询（例如，“的概念：红”）我发现，每个有效载荷始终是通过MyPayloadSimilarity传递所述第一数量（在代码示例中，这是1.0）而不是1.0，50.0和100.0。结果，所有文件都得到相同的有效载荷和相同的分数。但是，数据应该具有图片＃1，有效载荷为100.0，接着是图片＃2，接着是图片＃3，分数非常不同。我无法听到周围的消息。

下面是运行结果：

Query: concept:red 
===> docid: 0 payload: 1.0 
===> docid: 1 payload: 1.0 
===> docid: 2 payload: 1.0 
Number of results:3 
-> docid: 3.jpg score: 0.2518424 
-> docid: 2.jpg score: 0.2518424 
-> docid: 1.jpg score: 0.2518424

什么是错的？我误解了有关Payloads的一些信息吗？

代码

附上我分享我的代码作为一个独立的例子，使其尽可能容易让你运行它，你应该考虑这个选项。

public class PayloadShowcase { 

public static void main(String s[]) { 
    PayloadShowcase p = new PayloadShowcase(); 
    p.run(); 
} 

public void run() { 
    // Step 1: indexing 
    MyPayloadIndexer indexer = new MyPayloadIndexer(); 
    indexer.index(); 
    // Step 2: searching 
    MyPayloadSearcher searcher = new MyPayloadSearcher(); 
    searcher.search("red"); 
} 

public class MyPayloadAnalyzer extends Analyzer { 

    private PayloadEncoder encoder; 
    MyPayloadAnalyzer(PayloadEncoder encoder) { 
     this.encoder = encoder; 
    } 

    @Override 
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) { 
     Tokenizer source = new WhitespaceTokenizer(reader); 
     TokenStream filter = new LowerCaseFilter(source); 
     filter = new DelimitedPayloadTokenFilter(filter, '|', encoder); 
     return new TokenStreamComponents(source, filter); 
    } 
} 

public class MyPayloadIndexer { 

    public MyPayloadIndexer() {} 

    public void index() { 
     try { 
      Directory dir = FSDirectory.open(new File("D:/data/indices/sandbox")); 
      Analyzer analyzer = new MyPayloadAnalyzer(new FloatEncoder()); 
      IndexWriterConfig iwconfig = new IndexWriterConfig(Version.LUCENE_4_10_1, analyzer); 
      iwconfig.setSimilarity(new MyPayloadSimilarity()); 
      iwconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE); 

      // load mappings and classifiers 
      HashMap<String, String> mappings = this.loadDataMappings(); 
      HashMap<String, HashMap> cMaps = this.loadData(); 

      IndexWriter writer = new IndexWriter(dir, iwconfig); 
      indexDocuments(writer, mappings, cMaps); 
      writer.close(); 

     } catch (IOException e) { 
      System.out.println("Exception while indexing: " + e.getMessage()); 
     } 
    } 

    private void indexDocuments(IndexWriter writer, HashMap<String, String> fileMappings, HashMap<String, HashMap> concepts) throws IOException { 

     Set fileSet = fileMappings.keySet(); 
     Iterator<String> iterator = fileSet.iterator(); 
     while (iterator.hasNext()){ 
      // unique file information 
      String fileID = iterator.next(); 
      String filePath = fileMappings.get(fileID); 
      // create a new, empty document 
      Document doc = new Document(); 
      // path of the indexed file 
      Field pathField = new StringField("path", filePath, Field.Store.YES); 
      doc.add(pathField); 
      // lookup all concept probabilities for this fileID 
      Iterator<String> conceptIterator = concepts.keySet().iterator(); 
      while (conceptIterator.hasNext()){ 
       String conceptName = conceptIterator.next(); 
       HashMap conceptMap = concepts.get(conceptName); 
       doc.add(new TextField("concept", ("" + conceptName + "|").trim() + (conceptMap.get(fileID) + "").trim(), Field.Store.YES)); 
      } 
      writer.addDocument(doc); 
     } 
    } 

    public HashMap<String, String> loadDataMappings(){ 
     HashMap<String, String> h = new HashMap<>(); 
     h.put("1", "1.jpg"); 
     h.put("2", "2.jpg"); 
     h.put("3", "3.jpg"); 
     return h; 
    } 

    public HashMap<String, HashMap> loadData(){ 
     HashMap<String, HashMap> h = new HashMap<>(); 
     HashMap<String, String> green = new HashMap<>(); 
     green.put("1", "50.0"); 
     green.put("2", "1.0"); 
     green.put("3", "100.0"); 
     HashMap<String, String> red = new HashMap<>(); 
     red.put("1", "100.0"); 
     red.put("2", "50.0"); 
     red.put("3", "1.0"); 
     HashMap<String, String> blue = new HashMap<>(); 
     blue.put("1", "1.0"); 
     blue.put("2", "50.0"); 
     blue.put("3", "100.0"); 
     h.put("green", green); 
     h.put("red", red); 
     h.put("blue", blue); 
     return h; 
    } 
} 

class MyPayloadSimilarity extends DefaultSimilarity { 

    @Override 
    public float scorePayload(int docID, int start, int end, BytesRef payload) { 
     float pload = 1.0f; 
     if (payload != null) { 
      pload = PayloadHelper.decodeFloat(payload.bytes); 
     } 
     System.out.println("===> docid: " + docID + " payload: " + pload); 
     return pload; 
    } 
} 

public class MyPayloadSearcher { 

    public MyPayloadSearcher() {} 

    public void search(String queryString) { 
     try { 
      IndexReader reader = DirectoryReader.open(FSDirectory.open(new File("D:/data/indices/sandbox"))); 
      IndexSearcher searcher = new IndexSearcher(reader); 
      searcher.setSimilarity(new PayloadSimilarity()); 
      PayloadTermQuery query = new PayloadTermQuery(new Term("concept", queryString), 
        new AveragePayloadFunction()); 
      System.out.println("Query: " + query.toString()); 
      TopDocs topDocs = searcher.search(query, 999); 
      ScoreDoc[] hits = topDocs.scoreDocs; 
      System.out.println("Number of results:" + hits.length); 

      // output 
      for (int i = 0; i < hits.length; i++) { 
       Document doc = searcher.doc(hits[i].doc); 
       System.out.println("-> docid: " + doc.get("path") + " score: " + hits[i].score); 
      } 
      reader.close(); 

     } catch (Exception e) { 
      System.out.println("Exception while searching: " + e.getMessage()); 
     } 
    } 
}

}

来源

2014-10-29 RalfB

在MyPayloadSimilarity，PayloadHelper.decodeFloat调用不正确。在这种情况下，它也有必要通过payload.offset PARAM，像这样：

pload = PayloadHelper.decodeFloat(payload.bytes, payload.offset);

我希望它能帮助。

来源

2014-10-30 01:19:37

不，还是一样的结果。 :( – RalfB 2014-10-30 08:18:13

@ralfb，在你的示例代码中，'MyPayloadSearcher.search'设置了'PayloadSimilarity'（这里不存在，但可能存在于你的代码中）而不是'MyPayloadSimilarity'。这可能是你为什么没有看到变化，请确定你正在使用的是哪一种“PayloadSimilarity”课程 – 2014-10-30 18:24:58

哦，天啊！就是这样！我知道这是愚蠢的，我非常感谢你踢了我的脑子。未来，我还将创建单独的项目，并确保沙箱代码是孤立的，而不是在请求复制粘贴错误的同一项目中。谢谢Juliano！:) – RalfB 2014-10-30 18:56:38

Lucene的：有效载荷和相似功能---总是相同的负载值

回答

相关问题