2012-01-11 71 views
0

我使用的思维狮身人面像宝石我的查询需要大约45秒完成(13万条记录,包含索引的文件夹为1.1GB)。我假设我有一些配置不正确(第一次Sphinx用户)。无论如何,让我知道,如果你看到任何看起来不对劲。下面是我的配置:想法减少搜索时间在狮身人面像

define_index do 
    indexes :name 
    indexes :summary 
    indexes :tag_list 

    indexes categories.name, :as => :category_name 

    has "RADIANS(lat)", :as => :latitude, :type => :float 
    has "RADIANS(lng)", :as => :longitude, :type => :float 

    set_property :field_weights => { 
    :name   => 8, 
    :summary  => 6, 
    :category_name => 5, 
    :tag_list  => 3 
    } 
    set_property :delta => ThinkingSphinx::Deltas::ResqueDelta 
    set_property :ignore_chars => %w(' -) 
end 

下面是一个例子查询:

Location.search('Restaurant', 
       :geo => [0.5837843098436726,-1.9560609568879357], 
       :latitude_attr => "latitude", 
       :longitude_attr => "longitude", 
       :with => {"@geodist" => 0.0..4000.0}, 
       :include => :categories, 
       :page => 1, 
       :per_page => 100) 

我的日志显示:

Sphinx Query (43066.3ms) restaurant 
Sphinx Found 467 results 

我会继续通过文档挖掘和尝试的东西!

UPDATE:我development.sphinx.conf

indexer 
{ 
} 

searchd 
{ 
    listen = 127.0.0.1:9312 
    log = /project_path/log/searchd.log 
    query_log = /project_path/log/searchd.query.log 
    pid_file = /project_path/log/searchd.development.pid 
} 

source location_core_0 
{ 
    type = pgsql 
    sql_host = localhost 
    sql_user = user 
    sql_pass = pass 
    sql_db = db_name 
    sql_query_pre = UPDATE "business_entities" SET "delta" = FALSE WHERE "delta" = TRUE 
    sql_query_pre = SET TIME ZONE 'UTC' 
    sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = FALSE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type" 
    sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = FALSE 
    sql_attr_uint = sphinx_internal_id 
    sql_attr_uint = sphinx_deleted 
    sql_attr_uint = class_crc 
    sql_attr_float = latitude 
    sql_attr_float = longitude 
    sql_attr_string = sphinx_internal_class 
    sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0)/1) 
} 

index location_core 
{ 
    source = location_core_0 
    path = /project_path/db/sphinx/development/location_core 
    morphology = stem_en 
    charset_type = utf-8 
    ignore_chars = ', - 
    enable_star = 1 
} 

source location_delta_0 : location_core_0 
{ 
    type = pgsql 
    sql_host = localhost 
    sql_user = user 
    sql_pass = pass 
    sql_db = db_name 
    sql_query_pre = 
    sql_query_pre = SET TIME ZONE 'UTC' 
    sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = TRUE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type" 
    sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = TRUE 
    sql_attr_uint = sphinx_internal_id 
    sql_attr_uint = sphinx_deleted 
    sql_attr_uint = class_crc 
    sql_attr_float = latitude 
    sql_attr_float = longitude 
    sql_attr_string = sphinx_internal_class 
    sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0)/1) 
} 

index location_delta : location_core 
{ 
    source = location_delta_0 
    path = /project_path/db/sphinx/development/location_delta 
} 

index location 
{ 
    type = distributed 
    local = location_delta 
    local = location_core 
} 
+0

你能请在这里发表的sphinx.conf。 – 2012-01-12 09:48:59

+0

如果您确实发布配置文件,请确保您从中删除数据库凭证详细信息(用户名和密码)。 – pat 2012-01-12 12:32:05

+0

好的,张贴我的发展.sphinx.conf – 2012-01-12 16:10:52

回答

0

,我发现我的问题 - 这些记录恰好是在STI表,但我只希望这些索引类型地点(地点没有任何后代)。在这张表中的1300万条记录中,99.99984%(严重)是位置类型。 SELECT DISTINCT类型FROM business_entities查询时间过长(即使使用索引)。最棘手的部分也发觉了这一点,因为日志已报告狮身人面像查询持续84秒但它真的是那样的问题掠夺SQL查询:

SQL (43647.1ms) SELECT DISTINCT type FROM business_entities 
SQL (39857.7ms) SELECT DISTINCT type FROM business_entities 

Sphinx Query (84173.0ms) restaurant 

所以我猴子修补在初始化思考狮身人面像返回的唯一I型在乎:

module ThinkingSphinx 
    class Source 
    module SQL 
     def type_values 
     ['Location'] 
     end 
    end 
    end 
end 

https://gist.github.com/1603565

+1

你也可以在Sphinx配置中添加这个作为WHERE子句的一部分 - define_index块中的以下内容应该可以实现:'where'business_entities.type ='Location'“' – pat 2012-01-17 04:22:58

+1

另外:我建议在该类型列上放置数据库索引。 – pat 2012-01-17 04:23:26

0

我不知道正因如此,它的运行对于搜索这么慢,但我会在查询中简化的东西,加回的复杂性开始一点一点,看看是否有什么特别的原因。所以,第一:

Location.search('Restaurant') 

那么也许:

Location.search('Restaurant', :per_page => 100) 

等。不要忘记,您的索引定义中的:field_weights也会产生影响。

所有这一切说,我没有察觉任何与你在做什么特别奇怪的,43秒的搜索(或任何接近)是我还没有遇到过的。

+0

感谢您的答复帕特,我只是尝试了简单的查询 - 它需要更长的时间 - 我试图从索引中删除字段权重,它没有任何效果。我删除了关联关系的索引,这使得构建索引需要更少的时间。我会继续尝试...... – 2012-01-12 16:15:48