CPU Usage more than 100% in case of Rivering large data (around 100K records) (JBDC river plugin with ES )


(Anup Sakhare) #1

Hello,
I could see that while rivering the records from MySQL server , the CPU
usage riches almost 100%. We have around 100K record in MySQL Table, river
is created using following code,
curl -XPUT 'localhost:9200/river/myjdbc_river/

meta' -d '{ "type" : "jdbc", "jdbc" : { "driver" :
"com.mysql.jdbc.Driver", "url" : "jdbc:mysql://107.108.209.189:3306/CSPDB",
"user" : "root", "password" : "ssf", "strategy" : "simple", "sql" : "select
name AS user
name,OSPGuid AS user_ospguid,userId AS user_id,userId AS id,phone
AS user
phone,email AS user_email,extProperty AS user_profile,createTime as
user_createtime,modTime as user_modtime from CSPDB.User",
"index" : "user",
"type" : "user_type",
"bulk_size":1000,
"max_bulk_requests":50000,
"poll" : "60s"
}
}'
Hot Thread output
::: [Anup][OSRMJuOoTkaVWy-DgihUwA][inet[/107.108.209.27:9300]]

74.7% (373.4ms out of 500ms) cpu usage by thread 'elasticsearch[Anup][JDBC
river [user_river/simple]][T#1]'
9/10 snapshots sharing following 8 elements
org.xbib.elasticsearch.river.jdbc.support.SimpleValueListener.values(SimpleValueListener.java:22)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:519)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:499)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:312)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:243)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:183)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)
unique snapshot
sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:201)
sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:354)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:783)
java.nio.charset.Charset.decode(Charset.java:810)
com.mysql.jdbc.StringUtils.toString(StringUtils.java:1871)
com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:821)
com.mysql.jdbc.BufferRow.getString(BufferRow.java:542)
com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5816)
com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5693)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.parseType(SimpleRiverSource.java:750)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:512)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:499)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:312)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:243)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:183)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)

46.9% (234.5ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#1]'
8/10 snapshots sharing following 13 elements
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:375)
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:463)
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)
2/10 snapshots sharing following 19 elements
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:466)
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:499)
org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:349)
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:431)
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

45.6% (228ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#3]'
10/10 snapshots sharing following 11 elements
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

Can you please tell me what is wrong here. Is some problem with using
[ "bulk_size":1000,
"max_bulk_requests":50000 ]
settings? Any suggestion for configuration to be used while creating river
and its usage when indexing more records( around 1Milion) with shorter poll
time.

I am using all latest version for Elastic Search Server and matching JDBC
river plug-in(). Can any body help in this ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa3213ed-351f-4e02-b6f1-de12be74ec58%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Justin Doles) #2

You have max_bulk_requests set to bulk_size set to 1000. If I read the
JDBC docs correct, that would potentially result in trying to bulk 500,000
docs at a time.

Which CPU is high? MySQL or the nodes in ES?

On Monday, December 23, 2013 2:19:02 AM UTC-5, Anup Sakhare wrote:

Hello,
I could see that while rivering the records from MySQL server , the CPU
usage riches almost 100%. We have around 100K record in MySQL Table, river
is created using following code,
curl -XPUT 'localhost:9200/river/myjdbc_river/

meta' -d '{ "type" : "jdbc", "jdbc" : { "driver" :
"com.mysql.jdbc.Driver", "url" : "jdbc:mysql://107.108.209.189:3306/CSPDB
http://107.108.209.189:3306/CSPDB", "user" : "root", "password" : "ssf",
"strategy" : "simple", "sql" : "select name AS user
name,OSPGuid AS
user_ospguid,userId AS user_id,userId AS id,phone AS userphone,email AS
user_email,extProperty AS user_profile,createTime as
user_createtime,modTime as user_modtime from CSPDB.User",
"index" : "user",
"type" : "user_type",
"bulk_size":1000,
"max_bulk_requests":50000,
"poll" : "60s"
}
}'
Hot Thread output
::: [Anup][OSRMJuOoTkaVWy-DgihUwA][inet[/107.108.209.27:9300]]

74.7% (373.4ms out of 500ms) cpu usage by thread 'elasticsearch[Anup][JDBC
river [user_river/simple]][T#1]'
9/10 snapshots sharing following 8 elements

org.xbib.elasticsearch.river.jdbc.support.SimpleValueListener.values(SimpleValueListener.java:22)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:519)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:499)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:312)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:243)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:183)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)
unique snapshot
sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:201)
sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:354)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:783)
java.nio.charset.Charset.decode(Charset.java:810)
com.mysql.jdbc.StringUtils.toString(StringUtils.java:1871)
com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:821)
com.mysql.jdbc.BufferRow.getString(BufferRow.java:542)
com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5816)
com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5693)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.parseType(SimpleRiverSource.java:750)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:512)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:499)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:312)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:243)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:183)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)

46.9% (234.5ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#1]'
8/10 snapshots sharing following 13 elements

org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:375)

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:463)
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)

org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)
2/10 snapshots sharing following 19 elements

org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)

org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:466)
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:499)
org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:349)

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:431)
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)

org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

45.6% (228ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#3]'
10/10 snapshots sharing following 11 elements
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)

org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

Can you please tell me what is wrong here. Is some problem with using
[ "bulk_size":1000,
"max_bulk_requests":50000 ]
settings? Any suggestion for configuration to be used while creating river
and its usage when indexing more records( around 1Milion) with shorter poll
time.

I am using all latest version for Elastic Search Server and matching JDBC
river plug-in(). Can any body help in this ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96d9058d-12fb-4692-866b-080b87c4f7d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Justin Doles) #3

Meant *max_bulk_requests = 50000.

On Monday, December 23, 2013 10:35:59 AM UTC-5, Justin Doles wrote:

You have max_bulk_requests set to bulk_size set to 1000. If I read the
JDBC docs correct, that would potentially result in trying to bulk 500,000
docs at a time.

Which CPU is high? MySQL or the nodes in ES?

On Monday, December 23, 2013 2:19:02 AM UTC-5, Anup Sakhare wrote:

Hello,
I could see that while rivering the records from MySQL server , the CPU
usage riches almost 100%. We have around 100K record in MySQL Table, river
is created using following code,
curl -XPUT 'localhost:9200/river/myjdbc_river/

meta' -d '{ "type" : "jdbc", "jdbc" : { "driver" :
"com.mysql.jdbc.Driver", "url" : "jdbc:mysql://107.108.209.189:3306/CSPDB
http://107.108.209.189:3306/CSPDB", "user" : "root", "password" : "ssf",
"strategy" : "simple", "sql" : "select name AS user
name,OSPGuid AS
user_ospguid,userId AS user_id,userId AS id,phone AS userphone,email
AS user_email,extProperty AS user_profile,createTime as
user_createtime,modTime as user_modtime from CSPDB.User",
"index" : "user",
"type" : "user_type",
"bulk_size":1000,
"max_bulk_requests":50000,
"poll" : "60s"
}
}'
Hot Thread output
::: [Anup][OSRMJuOoTkaVWy-DgihUwA][inet[/107.108.209.27:9300]]

74.7% (373.4ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][JDBC river [user_river/simple]][T#1]'
9/10 snapshots sharing following 8 elements

org.xbib.elasticsearch.river.jdbc.support.SimpleValueListener.values(SimpleValueListener.java:22)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:519)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:499)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:312)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:243)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:183)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)
unique snapshot
sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:201)
sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:354)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:783)
java.nio.charset.Charset.decode(Charset.java:810)
com.mysql.jdbc.StringUtils.toString(StringUtils.java:1871)
com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:821)
com.mysql.jdbc.BufferRow.getString(BufferRow.java:542)
com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5816)
com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5693)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.parseType(SimpleRiverSource.java:750)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:512)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:499)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:312)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:243)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(SimpleRiverFlow.java:183)

org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)

46.9% (234.5ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#1]'
8/10 snapshots sharing following 13 elements

org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:375)

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:463)
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)

org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)
2/10 snapshots sharing following 19 elements

org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)

org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:466)
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:499)

org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:349)

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:431)
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)

org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

45.6% (228ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#3]'
10/10 snapshots sharing following 11 elements
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:590)

org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:495)

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:375)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:397)

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

Can you please tell me what is wrong here. Is some problem with using
[ "bulk_size":1000,
"max_bulk_requests":50000 ]
settings? Any suggestion for configuration to be used while creating
river and its usage when indexing more records( around 1Milion) with
shorter poll time.

I am using all latest version for Elastic Search Server and matching JDBC
river plug-in(). Can any body help in this ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50c701ac-6cc4-424c-aea4-ebbafc1e4811%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Anup Sakhare) #4

Hello Justin,
The CPU for Elastic Search server node is High. MySQL server is on another
node. Actually when I try to create a river with default values for the
max_bulk_requests,bulk_size I get same results like higher CPU for Elastic
Search Node. Attached few snapshots using BigDesk

On Mon, Dec 23, 2013 at 9:07 PM, Justin Doles jmdoles@gmail.com wrote:

Meant *max_bulk_requests = 50000.

On Monday, December 23, 2013 10:35:59 AM UTC-5, Justin Doles wrote:

You have max_bulk_requests set to bulk_size set to 1000. If I read the
JDBC docs correct, that would potentially result in trying to bulk 500,000
docs at a time.

Which CPU is high? MySQL or the nodes in ES?

On Monday, December 23, 2013 2:19:02 AM UTC-5, Anup Sakhare wrote:

Hello,
I could see that while rivering the records from MySQL server , the CPU
usage riches almost 100%. We have around 100K record in MySQL Table, river
is created using following code,
curl -XPUT 'localhost:9200/river/myjdbc_river/

meta' -d '{ "type" : "jdbc", "jdbc" : { "driver" :
"com.mysql.jdbc.Driver", "url" : "jdbc:mysql://107.108.209.189:3306/CSPDB
http://107.108.209.189:3306/CSPDB", "user" : "root", "password" : "ssf",
"strategy" : "simple", "sql" : "select name AS user
name,OSPGuid AS
user_ospguid,userId AS user_id,userId AS id,phone AS userphone,email
AS user_email,extProperty AS user_profile,createTime as
user_createtime,modTime as user_modtime from CSPDB.User",
"index" : "user",
"type" : "user_type",
"bulk_size":1000,
"max_bulk_requests":50000,
"poll" : "60s"
}
}'
Hot Thread output
::: [Anup][OSRMJuOoTkaVWy-DgihUwA][inet[/107.108.209.27:9300]]

74.7% (373.4ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][JDBC river [user_river/simple]][T#1]'
9/10 snapshots sharing following 8 elements
org.xbib.elasticsearch.river.jdbc.support.SimpleValueListener.values(
SimpleValueListener.java:22)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.processRow(SimpleRiverSource.java:519)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.nextRow(SimpleRiverSource.java:499)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.merge(SimpleRiverSource.java:312)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.fetch(SimpleRiverSource.java:243)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(
SimpleRiverFlow.java:183)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(
SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)
unique snapshot
sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:201)
sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:354)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:783)
java.nio.charset.Charset.decode(Charset.java:810)
com.mysql.jdbc.StringUtils.toString(StringUtils.java:1871)
com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:821)
com.mysql.jdbc.BufferRow.getString(BufferRow.java:542)
com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5816)
com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5693)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.parseType(SimpleRiverSource.java:750)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.processRow(SimpleRiverSource.java:512)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.nextRow(SimpleRiverSource.java:499)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.merge(SimpleRiverSource.java:312)
org.xbib.elasticsearch.river.jdbc.strategy.simple.
SimpleRiverSource.fetch(SimpleRiverSource.java:243)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.move(
SimpleRiverFlow.java:183)
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.run(
SimpleRiverFlow.java:120)
java.lang.Thread.run(Thread.java:722)

46.9% (234.5ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#1]'
8/10 snapshots sharing following 13 elements
org.apache.lucene.index.DocumentsWriter.postUpdate(
DocumentsWriter.java:375)
org.apache.lucene.index.DocumentsWriter.updateDocument(
DocumentsWriter.java:463)
org.apache.lucene.index.IndexWriter.updateDocument(
IndexWriter.java:1551)
org.elasticsearch.index.engine.robin.RobinEngine.
innerIndex(RobinEngine.java:590)
org.elasticsearch.index.engine.robin.RobinEngine.
index(RobinEngine.java:495)
org.elasticsearch.index.shard.service.InternalIndexShard.
index(InternalIndexShard.java:375)
org.elasticsearch.action.bulk.TransportShardBulkAction.
shardIndexOperation(TransportShardBulkAction.java:397)
org.elasticsearch.action.bulk.TransportShardBulkAction.
shardOperationOnPrimary(TransportShardBulkAction.java:155)
org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction.
performOnPrimary(TransportShardReplicationOperationAction.java:556)
org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1.
run(TransportShardReplicationOperationAction.java:426)
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)
2/10 snapshots sharing following 19 elements
org.apache.lucene.index.FreqProxTermsWriter.flush(
FreqProxTermsWriter.java:85)
org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
org.apache.lucene.index.DocFieldProcessor.flush(
DocFieldProcessor.java:81)
org.apache.lucene.index.DocumentsWriterPerThread.flush(
DocumentsWriterPerThread.java:466)
org.apache.lucene.index.DocumentsWriter.doFlush(
DocumentsWriter.java:499)
org.apache.lucene.index.DocumentsWriter.preUpdate(
DocumentsWriter.java:349)
org.apache.lucene.index.DocumentsWriter.updateDocument(
DocumentsWriter.java:431)
org.apache.lucene.index.IndexWriter.updateDocument(
IndexWriter.java:1551)
org.elasticsearch.index.engine.robin.RobinEngine.
innerIndex(RobinEngine.java:590)
org.elasticsearch.index.engine.robin.RobinEngine.
index(RobinEngine.java:495)
org.elasticsearch.index.shard.service.InternalIndexShard.
index(InternalIndexShard.java:375)
org.elasticsearch.action.bulk.TransportShardBulkAction.
shardIndexOperation(TransportShardBulkAction.java:397)
org.elasticsearch.action.bulk.TransportShardBulkAction.
shardOperationOnPrimary(TransportShardBulkAction.java:155)
org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction.
performOnPrimary(TransportShardReplicationOperationAction.java:556)
org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1.
run(TransportShardReplicationOperationAction.java:426)
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

45.6% (228ms out of 500ms) cpu usage by thread
'elasticsearch[Anup][bulk][T#3]'
10/10 snapshots sharing following 11 elements
org.apache.lucene.index.IndexWriter.updateDocument(
IndexWriter.java:1551)
org.elasticsearch.index.engine.robin.RobinEngine.
innerIndex(RobinEngine.java:590)
org.elasticsearch.index.engine.robin.RobinEngine.
index(RobinEngine.java:495)
org.elasticsearch.index.shard.service.InternalIndexShard.
index(InternalIndexShard.java:375)
org.elasticsearch.action.bulk.TransportShardBulkAction.
shardIndexOperation(TransportShardBulkAction.java:397)
org.elasticsearch.action.bulk.TransportShardBulkAction.
shardOperationOnPrimary(TransportShardBulkAction.java:155)
org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction.
performOnPrimary(TransportShardReplicationOperationAction.java:556)
org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1.
run(TransportShardReplicationOperationAction.java:426)
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

Can you please tell me what is wrong here. Is some problem with using
[ "bulk_size":1000,
"max_bulk_requests":50000 ]
settings? Any suggestion for configuration to be used while creating
river and its usage when indexing more records( around 1Milion) with
shorter poll time.

I am using all latest version for Elastic Search Server and matching
JDBC river plug-in(). Can any body help in this ?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/wOIiRA27pyc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/50c701ac-6cc4-424c-aea4-ebbafc1e4811%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks & Regards,
Anup Sakhare
Mob No: 9535520508

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH%3DR4_sVaJLBd4oR_q6aygEHDH0frARvFnxstAm__eioM1UY_g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5