2017-04-21 58 views
0

我想在我的简单的数据结构Django的环境中创建一个搜索引擎:的Django如何提高速度草垛搜索

| id   | comapany name | 
|:-----------|-----------------:| 
| 12345678 | company A's name | 
| 12345687 | peoples pizza a/s| 
| 87654321 | sub's for pugs | 

将有大约公司,我只是想通过搜索名称。 当找到名字时,我的django中会返回ID。

我试着大海捞针,嗖等,但我不断收到很慢搜索结果中的各种设置窗口,因为我从500〜我的测试数据集80万提高。 搜索有时需要将近一个小时

我使用的是PaaS的Heroku的,所以我想我会尝试一个集成的付费服务(searly的elasticsearch实现)。这有所帮助,但是当我到达大约8万家公司时,它又开始变得非常缓慢。

已安装的应用

INSTALLED_APPS = [ 
    'django.contrib.admin', 
    'django.contrib.auth', 
    'django.contrib.contenttypes', 
    'django.contrib.sessions', 
    'django.contrib.sites', 

    # Added. 
    'haystack', 

    # Then your usual apps... 
] 

更多settings.py

import os 
from urlparse import urlparse 

es = urlparse(os.environ.get('SEARCHBOX_URL') or 'http://127.0.0.1:9200/') 

port = es.port or 80 

HAYSTACK_CONNECTIONS = { 
    'default': { 
     'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine', 
     'URL': es.scheme + '://' + es.hostname + ':' + str(port), 
     'INDEX_NAME': 'documents', 
    }, 


if es.username: 
    HAYSTACK_CONNECTIONS['default']['KWARGS'] = {"http_auth": es.username + ':' + es.password} 

search_indexes.py

from haystack import indexes 

from hello.models import Article 


class ArticleIndex(indexes.SearchIndex, indexes.Indexable): 
    ''' 
    defines the model for the serach Engine. 
    ''' 
    text = indexes.CharField(document=True, use_template=True) 
    pub_date = indexes.DateTimeField(model_attr='pub_date') 
    # pub_date line was commented out previously 
    content_auto = indexes.EdgeNgramField(model_attr='title') 

    def get_model(self): 
     return Article 

    def index_queryset(self, using=None): 
     """Used when the entire index for model is updated.""" 
     return self.get_model().objects.all() 

article_text.txt

{{ object.title }} 
{{ object.user.get_full_name }} 
{{ object.body }} 

urls.py

url(r'^search/$', views.search_titles, name='search'), 

views.py

def search_titles(request): 
    txt = request.POST.get('search_text', '') 
    if txt and len(txt) >= 4: 
     articles = SearchQuerySet().autocomplete(content_auto=txt) 
    # if the post request is empty, return nothing 
    # this prevents internal server error with jquery 
    else: 
     articles = [] 
    return render_to_response('scripts/ajax_search.html', 
           {'articles': articles}) 

search.html

{% if articles.count > 0 %} 
    <!-- simply prints the links to the cvr numbers--> 
    <!-- for article in articles --> 
    {% for article in "x"|rjust:"15" %} 
     <li><a href="{{ article.object.get_absolute_url }}">{{ article.object.title }}</a></li> 
    {% endfor %} 

{% else %} 

    <li>Try again, or try CVR + &#x23ce;</li> 

{% endif %} 

的index.html(其中i调用搜索引擎)

{% csrf_token %} 
<input type="text" id="search" name="search" /> 

<!-- This <ul> all company names end up--> 
<ul id ="search-results"></ul> 

回答

0

我改变了我的ves.py搜索方法H中,以:

txt = request.POST.get('search_text', '') 
articles = [] 
suggestedSearchTerm = "" 
if txt and len(txt) >= 4: 
    sqs = SearchQuerySet() 
    sqs.query.set_limits(low=0, high=8) 
    sqs = sqs.filter(content=txt) 
    articles = sqs.query.get_results() 
    suggestedSearchTerm = SearchQuerySet().spelling_suggestion(txt) 
    if suggestedSearchTerm == txt: 
     suggestedSearchTerm = '' 
    else: 
     suggestedSearchTerm = suggestedSearchTerm.lower()