2013-02-27 66 views
1

我想从一个mysql表中索引1600万个文档(47GB)到elasticsearch索引。我正在使用jparante's elasticsearch jdbc river来做到这一点。但是,在创建河流并等待大约15分钟之后,整个堆积内存都被消耗掉了,而没有任何河流运行的迹象或文件被索引。当我有大约10-12百万条记录进行索引时,这条河运行良好。我曾尝试过3-4次,但徒劳无功。Elasticsearch jdbc河吞噬整个内存

Heap Memory pre allocated to the ES process = 10g

elasticsearch.yml

cluster.name: test_cluster 

index.cache.field.type: soft 
index.cache.field.max_size: 50000 
index.cache.field.expire: 2h 

cloud.aws.access_key: BBNYJC25Dij8JO7YM23I(fake) 
cloud.aws.secret_key: GqE6y009ZnkO/+D1KKzd6M5Mrl9/tIN2zc/acEzY(fake) 
cloud.aws.region: us-west-1 

discovery.type: ec2 
discovery.ec2.groups: sg-s3s3c2fc(fake) 
discovery.ec2.any_group: false 
discovery.zen.ping.timeout: 3m 

gateway.recover_after_nodes: 1 
gateway.recover_after_time: 1m 

bootstrap.mlockall: true 

network.host: 10.111.222.33(fake) 

river.sh

curl -XPUT 'http://--address--:9200/_river/myriver/_meta' -d '{ 
    "type" : "jdbc", 
    "jdbc" : { 
     "driver" : "com.mysql.jdbc.Driver", 
     "url" : "jdbc:mysql://--address--:3306/mydatabase", 
     "user" : "USER", 
     "password" : "PASSWORD", 
     "sql" : "select * from mytable order by creation_time desc", 
     "poll" : "5d", 
     "versioning" : false 
    }, 
    "index" : { 
     "index" : "myindex", 
     "type" : "mytype", 
     "bulk_size" : 500, 
     "bulk_timeout" : "240s" 
    } 
}' 

系统属性:

16gb RAM 
200gb disk space 

回答