2011-09-30 77 views
5

像往常一样,我转向大量的脑力量,这是Stackoverflow用户群,以帮助解决我遇到的Lucene.NET问题。首先,当我谈到Lucene和Lucene.NET时,我是一个完整的noob,并且通过在线使用分散的教程和代码片段,我已经为我的场景拼凑了以下解决方案。Lucene.net:查询和使用过滤器来限制结果

场景

我有以下结构的指标:

--------------------------------------------------------- 
| id | date | security |   text   | 
--------------------------------------------------------- 
| 1 | 2011-01-01 | -1-12-4- | some analyzed text here | 
--------------------------------------------------------- 
| 2 | 2011-01-01 | -11-3- | some analyzed text here | 
--------------------------------------------------------- 
| 3 | 2011-01-01 | -1- | some analyzed text here | 
--------------------------------------------------------- 

我需要能够查询文本字段,但结果限制为具有特定角色ID的用户。

我想出了要做到这一点(很多,多次到谷歌后)是使用“安全领域”和Lucene的过滤器来限制结果集简介如下:

class SecurityFilter : Lucene.Net.Search.Filter 
{ 
    public override System.Collections.BitArray Bits(Lucene.Net.Index.IndexReader indexReader) 
    { 
     BitArray bitarray = new BitArray(indexReader.MaxDoc()); 

     for (int i = 0; i < bitarray.Length; i++) 
     { 
      if (indexReader.Document(i).Get("security").Contains("-1-")) 
      { 
       bitarray.Set(i, true); 
      } 
     } 

     return bitarray; 
    } 
} 

。 ..然后...

Lucene.Net.Search.Sort sort = new Lucene.Net.Search.Sort(new Lucene.Net.Search.SortField("date", true)); 
Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29); 
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(indexDirectory), true); 
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer); 
Lucene.Net.Search.Query query = parser.Parse("some search phrase"); 
SecurityFilter filter = new SecurityFilter(); 
Lucene.Net.Search.Hits hits = searcher.Search(query, filter, sort); 

可正常工作,只会返回与ID的1和3的问题是,在大型索引这一过程变得很慢文件。

最后,我的问题......有没有人有任何提示如何加快它,或有一个替代解决方案,会比我在这里提出的更有效率?

+0

是否可以更改索引的格式? – goalie7960

+0

是的,此时可以修改任何东西。 – nokturnal

回答

5

我改变了我的答案,用一个简单的例子来解释我以前的答案。

我做到了这一点,并不尊重最佳实践,但它应该给你的想法。

请注意,安全性字段需要进行标记,以便其中的每个ID都是单独的标记,例如使用WhitespaceAnalyzer

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using Lucene.Net.Search; 
using Lucene.Net.Documents; 
using Lucene.Net.Index; 
using Lucene.Net.Analysis.Standard; 
using System.IO; 

namespace ConsoleApplication1 
{ 
    class Program 
    { 
     public class RoleFilterCache 
     { 
      static public Dictionary<string, Filter> Cache = new Dictionary<string,Filter>(); 

      static public Filter Get(string role) 
      { 
       Filter cached = null; 
       if (!Cache.TryGetValue(role, out cached)) 
       { 
        return null; 
       } 
       return cached; 
      } 

      static public void Put(string role, Filter filter) 
      { 
       if (role != null) 
       { 
        Cache[role] = filter; 
       } 
      } 
     } 

     public class User 
     { 
      public string Username; 
      public List<string> Roles; 
     } 

     public static Filter GetFilterForUser(User u) 
     { 
      BooleanFilter userFilter = new BooleanFilter(); 
      foreach (string rolename in u.Roles) 
      { 
       // call GetFilterForRole and add to the BooleanFilter 
       userFilter.Add(
        new BooleanFilterClause(GetFilterForRole(rolename), BooleanClause.Occur.SHOULD) 
       ); 
      } 
      return userFilter; 
     } 

     public static Filter GetFilterForRole(string role) 
     { 
      Filter roleFilter = RoleFilterCache.Get(role); 
      if (roleFilter == null) 
      { 
       roleFilter = 
        // the caching wrapper filter makes it cache the BitSet per segmentreader 
        new CachingWrapperFilter(
         // builds the filter from the index and not from iterating 
         // stored doc content which is much faster 
         new QueryWrapperFilter(
          new TermQuery(
           new Term("security", role) 
          ) 
         ) 
       ); 
       // put in cache 
       RoleFilterCache.Put(role, roleFilter); 
      } 
      return roleFilter; 
     } 


     static void Main(string[] args) 
     { 
      IndexWriter iw = new IndexWriter(new FileInfo("C:\\example\\"), new StandardAnalyzer(), true); 
      Document d = new Document(); 

      Field aField = new Field("content", "", Field.Store.YES, Field.Index.ANALYZED); 
      Field securityField = new Field("security", "", Field.Store.NO, Field.Index.ANALYZED); 

      d.Add(aField); 
      d.Add(securityField); 

      aField.SetValue("Only one can see."); 
      securityField.SetValue("1"); 
      iw.AddDocument(d); 
      aField.SetValue("One and two can see."); 
      securityField.SetValue("1 2"); 
      iw.AddDocument(d); 
      aField.SetValue("One and two can see."); 
      securityField.SetValue("1 2"); 
      iw.AddDocument(d); 
      aField.SetValue("Only two can see."); 
      securityField.SetValue("2"); 
      iw.AddDocument(d); 

      iw.Close(); 

      User userone = new User() 
      { 
       Username = "User one", 
       Roles = new List<string>() 
      }; 
      userone.Roles.Add("1"); 
      User usertwo = new User() 
      { 
       Username = "User two", 
       Roles = new List<string>() 
      }; 
      usertwo.Roles.Add("2"); 
      User userthree = new User() 
      { 
       Username = "User three", 
       Roles = new List<string>() 
      }; 
      userthree.Roles.Add("1"); 
      userthree.Roles.Add("2"); 

      PhraseQuery phraseQuery = new PhraseQuery(); 
      phraseQuery.Add(new Term("content", "can")); 
      phraseQuery.Add(new Term("content", "see")); 

      IndexSearcher searcher = new IndexSearcher("C:\\example\\", true); 

      Filter securityFilter = GetFilterForUser(userone); 
      TopDocs results = searcher.Search(phraseQuery, securityFilter,25); 
      Console.WriteLine("User One Results:"); 
      foreach (var aResult in results.ScoreDocs) 
      { 
       Console.WriteLine(
        searcher.Doc(aResult.doc). 
        Get("content") 
       ); 
      } 
      Console.WriteLine("\n\n"); 

      securityFilter = GetFilterForUser(usertwo); 
      results = searcher.Search(phraseQuery, securityFilter, 25); 
      Console.WriteLine("User two Results:"); 
      foreach (var aResult in results.ScoreDocs) 
      { 
       Console.WriteLine(
        searcher.Doc(aResult.doc). 
        Get("content") 
       ); 
      } 
      Console.WriteLine("\n\n"); 

      securityFilter = GetFilterForUser(userthree); 
      results = searcher.Search(phraseQuery, securityFilter, 25); 
      Console.WriteLine("User three Results (should see everything):"); 
      foreach (var aResult in results.ScoreDocs) 
      { 
       Console.WriteLine(
        searcher.Doc(aResult.doc). 
        Get("content") 
       ); 
      } 
      Console.WriteLine("\n\n"); 
      Console.ReadKey(); 
     } 
    } 
} 
+0

+1使用缓存过滤器。他们工作得非常好,以至于他们在这里显而易见。我轻度不同意使用查询来执行安全性查询:我认为(缓存)过滤器应该用于您不想评分的所有内容,并且查询需要评分的术语查询。 –

+0

nokturnal代码中的(主要)问题不是缓存。他扫描索引中的所有文档以形成过滤器(“Contains”也是此处的另一个缺陷)。 –

+0

我有些困惑,但我相信这是由于我对Lucene的经验不足。让我看看我是否有这个......基本上,我会为每次修改索引时重建的每个角色创建一个缓存过滤器。上面的场景是否允许同时应用多个缓存过滤器(每个角色用户有效)? – nokturnal

6

如果索引你的安全领域如分析(例如其将您的安全字符串1 12 4 ...)

,你可以创建一个这样

Filter filter = new QueryFilter(new TermQuery(new Term("security ", "1"))); 

过滤器

查询如some text +security:1

+0

有趣的解决方案,我明天就会搞砸了,让你知道它是怎么回事。 – nokturnal

+0

@LB:这个解决方案运行良好,但它现在正在影响你在这个http://stackoverflow.com/questions/7662829/lucene-net-range-queries-highlighting上帮助我的解决方案。有没有一种方法来从+安全性没有“1”:1突出显示? – nokturnal

+0

不,那么你必须使用基于过滤器的解决方案 –