2013-04-30 64 views
1

我想索引使用Lucene 4.2的一组文档。我创建了一个自定义的分析,不记号化和不小写的术语,用下面的代码:索引字段的Lucene 4.2分析器

 public class NoTokenAnalyzer extends Analyzer{ 
public Version matchVersion; 
public NoTokenAnalyzer(Version matchVersion){ 
    this.matchVersion=matchVersion; 
} 
@Override 
protected TokenStreamComponents createComponents(String fieldName, Reader reader) { 
    // TODO Auto-generated method stub 
    //final Tokenizer source = new NoTokenTokenizer(matchVersion, reader); 
    final KeywordTokenizer source=new KeywordTokenizer(reader); 
    TokenStream result = new LowerCaseFilter(matchVersion, source); 
    return new TokenStreamComponents(source, result); 

} 

}

我使用分析器来构建指数(灵感来自了Lucene文档中提供代码):

public static void IndexFile(Analyzer analyzer) throws IOException{ 
    boolean create=true; 



String directoryPath="path"; 
File folderToIndex=new File(directoryPath); 
File[]filesToIndex=folderToIndex.listFiles(); 

Directory directory=FSDirectory.open(new File("index path")); 

IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_42, analyzer); 

     if (create) { 
     // Create a new index in the directory, removing any 
     // previously indexed documents: 
     iwc.setOpenMode(OpenMode.CREATE); 
    } else { 
     // Add new documents to an existing index: 
     iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); 
     } 

     IndexWriter writer = new IndexWriter(directory, iwc); 
for (final File singleFile : filesToIndex) { 


//process files in the directory and extract strings to index 
    //.......... 
    String field1; 
    String field2; 

    //index fields 

     Document doc=new Document(); 


    Field f1Field= new Field("f1", field1, TextField.TYPE_STORED); 


     doc.add(f1Field); 
     doc.add(new Field("f2", field2, TextField.TYPE_STORED)); 
     } 
writer.close(); 
    } 

与代码的问题是,索引字段没有被标记化的,但是它们也未小写的,即,似乎在索引期间未施加分析仪。 我不明白什么是错的?我如何使分析仪工作?

回答

1

该代码正常工作。因此,它可能会帮助某人在Lucene 4.2中创建自定义分析器,并将其用于索引和搜索。