0
我使用lucene 3来索引一些像这样的txt文件。为什么索引器不搜索波斯文件?
public static void main(String[] args) throws Exception {
String indexDir = "file input";
String dataDir = "file input";
long start = System.currentTimeMillis();
indexer indexer = new indexer(indexDir);
int numIndexed, cnt;
try {
numIndexed = indexer.index(dataDir, new TextFilesFilter());
cnt = indexer.getHitCount("mycontents", "شهردار");
System.out.println("count of search in contents: " + cnt);
} finally {
indexer.close();
}
long end = System.currentTimeMillis();
System.out.println("Indexing " + numIndexed + " files took "
+ (end - start) + " milliseconds");
}
getHitCount函数返回英文单词的点击次数,但通过波斯语单词返回零!
public int getHitCount(String fieldName, String searchString)
throws IOException, ParseException {
IndexSearcher searcher = new IndexSearcher(directory);
Term t = new Term(fieldName, searchString);
Query query = new TermQuery(t);
int hitCount = searcher.search(query, 1).totalHits;
searcher.close();
return hitCount;
}
如何在我的项目中设置utf-8?我使用netbeans并创建一个简单的java项目。 我只需要一个简单的文件搜索!
这是我的索引类:
private IndexWriter writer;
private Directory directory;
public indexer(String indexDir) throws IOException {
directory = FSDirectory.open(new File(indexDir));
writer = new IndexWriter(directory,
new StandardAnalyzer(
Version.LUCENE_30),
true,
IndexWriter.MaxFieldLength.UNLIMITED);
}
public void close() throws IOException {
writer.close();
}
public int index(String dataDir, FileFilter filter)
throws Exception {
File[] files = new File(dataDir).listFiles();
for (File f : files) {
if (!f.isDirectory()
&& !f.isHidden()
&& f.exists()
&& f.canRead()
&& (filter == null || filter.accept(f))) {
indexFile(f);
}
}
return writer.numDocs();
}
private static class TextFilesFilter implements FileFilter {
public boolean accept(File path) {
return path.getName().toLowerCase()
.endsWith(".txt");
}
}
protected Document getDocument(File f) throws Exception {
Document doc = new Document();
doc.add(new Field("mycontents", new FileReader(f)));
doc.add(new Field("filename", f.getName(),
Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("fullpath", f.getCanonicalPath(),
Field.Store.YES, Field.Index.NOT_ANALYZED));
return doc;
}
private void indexFile(File f) throws Exception {
System.out.println("Indexing " + f.getCanonicalPath());
Document doc = getDocument(f);
writer.addDocument(doc);
}
我们可以看到你的索引类?这似乎是你自己实施的东西 – Niklas
@Niklas我编辑了我的问题。 – NASRIN
这会帮助你:http://stackoverflow.com/questions/23030329/lucene-encoding-java – Niklas