Elasticsearch jdbc river eats up entire memory

aman_singh · February 27, 2013, 9:06am

I am trying to index 16 million docs(47gb) from a mysql table into
elasticsearch index. I am using jparante's elasticsearch jdbc river to
do this. But, after creating the river and waiting for about 15 mins, the
entire heap memory gets consumed without any sign of the river running or
docs getting indexed. The river used to run fine when I had around 10-12
million records to index. I have tried running the river 3-4 times, but in
vain.

Heap Memory pre allocated to the ES process = 10g

elasticsearch.yml

 cluster.name: test_cluster

 index.cache.field.type: soft
 index.cache.field.max_size: 50000
 index.cache.field.expire: 2h

 cloud.aws.access_key: BBNYJC25Dij8JO7YM23I(fake)
 cloud.aws.secret_key: GqE6y009ZnkO/+D1KKzd6M5Mrl9/tIN2zc/acEzY(fake)
 cloud.aws.region: us-west-1

 discovery.type: ec2
 discovery.ec2.groups: sg-s3s3c2fc(fake)
 discovery.ec2.any_group: false
 discovery.zen.ping.timeout: 3m

 gateway.recover_after_nodes: 1
 gateway.recover_after_time: 1m

 bootstrap.mlockall: true

 network.host: 10.111.222.33(fake)

river.sh

curl -XPUT 'http://--address--:9200/_river/myriver/_meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "com.mysql.jdbc.Driver",
        "url" : "jdbc:mysql://--address--:3306/mydatabase",
        "user" : "USER",
        "password" : "PASSWORD",
        "sql" : "select * from mytable order by creation_time desc",
        "poll" : "5d",
        "versioning" : false
    },
    "index" : {
        "index" : "myindex",
        "type" : "mytype",
        "bulk_size" : 500,
        "bulk_timeout" : "240s"
    }
}'

System properties

16gb RAM
200gb disk space

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 27, 2013, 12:31pm

Do you use JDBC river version 2.0.3?

There is a fix for MySQL JDBC result streaming

github.com/jprante/elasticsearch-jdbc

java OutOfMemoryError

opened 08:23AM - 24 Jan 13 UTC

closed 08:38AM - 24 Jan 13 UTC

peppy

While attempting to import a table of just under 1gb in size, I encounter the fo…llowing error: java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid6729.hprof ... Heap dump file created [1109933017 bytes in 2.819 secs] Exception in thread "elasticsearch[Glitch][JDBC river [osu/oneshot)][T#1]" java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.<init>(Buffer.java:59) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1943) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3401) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:483) at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3096) at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2266) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2687) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155) at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2318) at org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.executeQuery(SimpleRiverSource.java:383) at org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:232) at org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:180) at org.elasticsearch.river.jdbc.strategy.oneshot.OneShotRiverFlow.run(OneShotRiverFlow.java:38) at java.lang.Thread.run(Thread.java:722) This is on a server with 32gb RAM (64 bit), for what it's worth.

Best regards,

Jörg

Am 27.02.13 10:06, schrieb aman singh:

I am trying to index 16 million docs(47gb) from a mysql table into
elasticsearch index. I am using [jparante's elasticsearch jdbc
river][1] to do this. But, after creating the river and waiting for
about 15 mins, the entire heap memory gets consumed without any sign
of the river running or docs getting indexed. The river used to run
fine when I had around 10-12 million records to index. I have tried
running the river 3-4 times, but in vain.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
JDBC River - Elasticsearch crashes while indexing Elasticsearch	3	388	July 6, 2017
ES jdbc river problem Elasticsearch	1	351	July 6, 2017
[Ann] JDBC River Plugin for ElasticSearch Elasticsearch	20	2694	July 6, 2017
Mysql JDBC Extremely Slow Logstash	1	1127	July 6, 2017
Elasticsearch JDBC River problem Elasticsearch	3	312	July 6, 2017

Elasticsearch jdbc river eats up entire memory

elasticsearch.yml

river.sh

System properties

Related topics