我想为推文创建一个小型搜索引擎。我有一个包含20000个推文的txt文件。文件格式是这样的:在Lucene中索引txt文件
TommyFrench1
851
85170333395811123
Lurgan, Moira, Armagh. Derry
This week we are double delight on first goalscorers on the four Champions League matches in shop. ChampionsLeagueIm_Aarkay
175
851703414300037122
Paris
@ChampionsLeague @AS_Monaco @AS_Monaco_EN Nopes, it's when City knocked outta Champions league. .
.
etc
第一行是username
,其次我有followers
,其次是id
和location
和最后一个是text(tweet)
。
我认为每条推文都是一个文档。所以我必须有20000个文件,每个文件必须有5个字段(用户名,追随者,ID等)。
我该如何编制索引?
我已经看到了一些教程,但我并没有发现类似
编辑的东西:这是我的代码。
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
public class MyProgram {
public static void main(String[] args) throws IOException, ParseException {
FileReader fileReader = new FileReader(new File("myfile.txt"));
BufferedReader br = new BufferedReader(fileReader);
String line = null;
String indexPath = "C:\\Desktop\\myfolder";
Directory dir = FSDirectory.open(Paths.get(indexPath));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(dir, iwc);
while ((line = br.readLine()) != null) {
// reading lines until the end of the file
Document doc = new Document();
String username = br.readLine();
doc.add(new Field("username", username, Field.Store.YES, Field.Index.ANALYZED)); // adding title field
String followers = br.readLine();
doc.add(new Field("followers", followers, Field.Store.YES, Field.Index.ANALYZED));
String id = br.readLine();
doc.add(new Field("id", id, Field.Store.YES, Field.Index.ANALYZED));
String location = br.readLine();
doc.add(new Field("location", location, Field.Store.YES, Field.Index.ANALYZED));
String text = br.readLine();
doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(doc); // writing new document to the index
br.readLine();
}
}
}
即时得到以下错误: Index cannot be resolved or is not a field
。
我该如何解决这个问题?
你说的“索引”的意思是,你要达到这个是什么? –
我有一个项目为20000条推文创建一个小型搜索机器。索引过程是Lucene提供的核心功能之一。我必须阅读txt文件,并且每条推文都必须是文档。然后,每个文档必须有域用户名,ID,位置等我有关于热它的工作原理,但即时通讯初学者在Lucene和我不能找到类似这样的东西 –
你有没有看这个问题的想法:http://stackoverflow.com /问题/ 4091441 /怎么办-I-索引和搜索文本文件功能于Lucene的-3-0-2?RQ = 1 –