Elasticsearch jdbc river eats up entire memory

I am trying to index 16 million docs(47gb) from a mysql table into
elasticsearch index. I am using jparante's elasticsearch jdbc river to
do this. But, after creating the river and waiting for about 15 mins, the
entire heap memory gets consumed without any sign of the river running or
docs getting indexed. The river used to run fine when I had around 10-12
million records to index. I have tried running the river 3-4 times, but in
vain.

Heap Memory pre allocated to the ES process = 10g

elasticsearch.yml

 cluster.name: test_cluster

 index.cache.field.type: soft
 index.cache.field.max_size: 50000
 index.cache.field.expire: 2h

 cloud.aws.access_key: BBNYJC25Dij8JO7YM23I(fake)
 cloud.aws.secret_key: GqE6y009ZnkO/+D1KKzd6M5Mrl9/tIN2zc/acEzY(fake)
 cloud.aws.region: us-west-1

 discovery.type: ec2
 discovery.ec2.groups: sg-s3s3c2fc(fake)
 discovery.ec2.any_group: false
 discovery.zen.ping.timeout: 3m

 gateway.recover_after_nodes: 1
 gateway.recover_after_time: 1m

 bootstrap.mlockall: true

 network.host: 10.111.222.33(fake)

river.sh

curl -XPUT 'http://--address--:9200/_river/myriver/_meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "com.mysql.jdbc.Driver",
        "url" : "jdbc:mysql://--address--:3306/mydatabase",
        "user" : "USER",
        "password" : "PASSWORD",
        "sql" : "select * from mytable order by creation_time desc",
        "poll" : "5d",
        "versioning" : false
    },
    "index" : {
        "index" : "myindex",
        "type" : "mytype",
        "bulk_size" : 500,
        "bulk_timeout" : "240s"
    }
}'

System properties

16gb RAM
200gb disk space

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Do you use JDBC river version 2.0.3?

There is a fix for MySQL JDBC result streaming

Best regards,

Jörg

Am 27.02.13 10:06, schrieb aman singh:

I am trying to index 16 million docs(47gb) from a mysql table into
elasticsearch index. I am using [jparante's elasticsearch jdbc
river][1] to do this. But, after creating the river and waiting for
about 15 mins, the entire heap memory gets consumed without any sign
of the river running or docs getting indexed. The river used to run
fine when I had around 10-12 million records to index. I have tried
running the river 3-4 times, but in vain.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.