My requirement is to index the table from oracle which has around 950
million rows of data.
We had the infra of 4 nodes
Each node has :4 CPU,16gig memory and 122 gig Storage
my cofiguraion is ES_HEAP 8 GB
8 shards
and set up the refresh rate to -1
bulk insert for 1000o rows
spaning from 4 machine with 20 parallel threads.Initially it is loading
around 2.5 million in 2 mins after it get close to 100 million the indexing
is stopped.
Could anybody please suggest me the best way to deal with this huge
data.Any pointers on best infra setup
Can you tell more details about "the indexing is stopped"? Do you stop it
intentionally?
How long does it take for 100 million docs? Do you use Java API or HTTP
API? How many concurrent bulk request do you send? Do you evaluate the
BulkResponses? Do you use monitoring tools? Is the heap full, did you
starve the indexing? Are there GC pauses? Is something in the (debug level)
logs? How large are the segments that you are merging?
I had not intentionally stopped .It had stopped but didn't logged any
exceptionsi am doing 80 concuent bulk requests each with 10k docs.Bulk
response are fine untli 100 million able to insert all the records with
uuid key.
i am using the default segment merging not configured other than out of
box.Heap is running hot aroud 80% usage but not full.i am not using any
monitoring tools.
On Friday, December 20, 2013 12:04:35 PM UTC-8, Jörg Prante wrote:
Can you tell more details about "the indexing is stopped"? Do you stop it
intentionally?
How long does it take for 100 million docs? Do you use Java API or HTTP
API? How many concurrent bulk request do you send? Do you evaluate the
BulkResponses? Do you use monitoring tools? Is the heap full, did you
starve the indexing? Are there GC pauses? Is something in the (debug level)
logs? How large are the segments that you are merging?
So I understand correct, your bulk indexing runs, but suddenly, no more
docs are reaching ES, and the indexer just hangs?
There must be a reason why the hang happened. This effect is known as
"starving".
The default heap GC jumps in at 80% to compensate too long GC full stops.
Maybe you should try to find it if increasing GC is a causing the starving
or not. Sometimes the JVM aborts with diagnostic message like "GC exceeds
overhead limit". This may happen at client side, or at server side too.
For the server, there is bigdesk plugin or jvisualvm (in the JDK) for this
task, and even more monitoring tools for ES are available.
Maybe it is possible to collect more info about what is going on, so the
cause of the trouble can be better identified.
Thanks Jorn
I will install the monitors/profiles and post the memory detials
On Friday, December 20, 2013 1:42:35 PM UTC-8, Jörg Prante wrote:
So I understand correct, your bulk indexing runs, but suddenly, no more
docs are reaching ES, and the indexer just hangs?
There must be a reason why the hang happened. This effect is known as
"starving".
The default heap GC jumps in at 80% to compensate too long GC full stops.
Maybe you should try to find it if increasing GC is a causing the starving
or not. Sometimes the JVM aborts with diagnostic message like "GC exceeds
overhead limit". This may happen at client side, or at server side too.
For the server, there is bigdesk plugin or jvisualvm (in the JDK) for this
task, and even more monitoring tools for ES are available.
Maybe it is possible to collect more info about what is going on, so the
cause of the trouble can be better identified.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.