aman_singh
(aman singh)
February 27, 2013, 9:06am
1
I am trying to index 16 million docs(47gb) from a mysql table into
elasticsearch index. I am using jparante's elasticsearch jdbc river to
do this. But, after creating the river and waiting for about 15 mins, the
entire heap memory gets consumed without any sign of the river running or
docs getting indexed. The river used to run fine when I had around 10-12
million records to index. I have tried running the river 3-4 times, but in
vain.
Heap Memory pre allocated to the ES process = 10g
elasticsearch.yml
cluster.name: test_cluster
index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 2h
cloud.aws.access_key: BBNYJC25Dij8JO7YM23I(fake)
cloud.aws.secret_key: GqE6y009ZnkO/+D1KKzd6M5Mrl9/tIN2zc/acEzY(fake)
cloud.aws.region: us-west-1
discovery.type: ec2
discovery.ec2.groups: sg-s3s3c2fc(fake)
discovery.ec2.any_group: false
discovery.zen.ping.timeout: 3m
gateway.recover_after_nodes: 1
gateway.recover_after_time: 1m
bootstrap.mlockall: true
network.host: 10.111.222.33(fake)
river.sh
curl -XPUT 'http://--address--:9200/_river/myriver/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://--address--:3306/mydatabase",
"user" : "USER",
"password" : "PASSWORD",
"sql" : "select * from mytable order by creation_time desc",
"poll" : "5d",
"versioning" : false
},
"index" : {
"index" : "myindex",
"type" : "mytype",
"bulk_size" : 500,
"bulk_timeout" : "240s"
}
}'
System properties
16gb RAM
200gb disk space
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
For more options, visit https://groups.google.com/groups/opt_out .
jprante
(Jörg Prante)
February 27, 2013, 12:31pm
2
Do you use JDBC river version 2.0.3?
There is a fix for MySQL JDBC result streaming
opened 08:23AM - 24 Jan 13 UTC
closed 08:38AM - 24 Jan 13 UTC
While attempting to import a table of just under 1gb in size, I encounter the fo… llowing error:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid6729.hprof ...
Heap dump file created [1109933017 bytes in 2.819 secs]
Exception in thread "elasticsearch[Glitch][JDBC river [osu/oneshot)][T#1]" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.<init>(Buffer.java:59)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1943)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3401)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:483)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3096)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2266)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2687)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2318)
at org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.executeQuery(SimpleRiverSource.java:383)
at org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:232)
at org.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:180)
at org.elasticsearch.river.jdbc.strategy.oneshot.OneShotRiverFlow.run(OneShotRiverFlow.java:38)
at java.lang.Thread.run(Thread.java:722)
This is on a server with 32gb RAM (64 bit), for what it's worth.
Best regards,
Jörg
Am 27.02.13 10:06, schrieb aman singh:
I am trying to index 16 million docs(47gb) from a mysql table into
elasticsearch index. I am using [jparante's elasticsearch jdbc
river][1] to do this. But, after creating the river and waiting for
about 15 mins, the entire heap memory gets consumed without any sign
of the river running or docs getting indexed. The river used to run
fine when I had around 10-12 million records to index. I have tried
running the river 3-4 times, but in vain.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
For more options, visit https://groups.google.com/groups/opt_out .