1
每嗖文档here,给StemmingAnalyzer无限制的高速缓存使得一批索引的速度更快:文件不被索引
writer = myindex.writer()
# Get the analyzer object from a text field
stem_ana = writer.schema["content"].format.analyzer
# Set the cachesize to -1 to indicate unbounded caching
stem_ana.cachesize = -1
# Reset the analyzer to pick up the changed attribute
stem_ana.clear()
# Use the writer to index documents...
唯一的问题是,文件没有被这样做之后索引: 这里是我的架构:
schema = Schema(
title=TEXT(stored=True, analyzer=StemmingAnalyzer(), field_boost=2.0),
content=TEXT(stored=True, analyzer=StemmingAnalyzer()),
owner=NUMERIC(stored=True),
id=ID(stored=True, unique=True),
date=DATETIME(stored=True, sortable=True),
author=TEXT(stored=True),
system=TEXT(stored=True),
url=TEXT(stored=True),
type=TEXT(stored=True),
service=TEXT(stored=True),
last_updated=fields.DATETIME)
我怎么指数(从XML):
docs = xmlObj.findall('document')
for d in docs:
...
writer.update_document(...)
writer.commit()
后,我改变了词干缓存,什么也不显示当我这样做:
for doc in ix.reader().iter_docs():
#doc should be a tuple of (docnum, document)
print "docnum: {}".format(doc[0])
请详细说明,它是如何索引?显示的错误? 0文件?你不能查询他们与查询? –
我编辑的问题,我得到0文件 – Hakim