ElasticSearch for the huge data (around 950 million) indexing from oracle


(kondapallinaresh) #1

Hi All,

My requirement is to index the table from oracle which has around 950 

million rows of data.

We had the infra of 4 nodes

Each node has :4 CPU,16gig memory and 122 gig Storage

my cofiguraion is ES_HEAP 8 GB

8 shards

and set up the refresh rate to -1

bulk insert for 1000o rows

spaning from 4 machine with 20 parallel threads.Initially it is loading
around 2.5 million in 2 mins after it get close to 100 million the indexing
is stopped.

Could anybody please suggest me the best way to deal with this huge
data.Any pointers on best infra setup

Thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b17f9494-6f33-4e50-8137-e9f4d286c06a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Can you tell more details about "the indexing is stopped"? Do you stop it
intentionally?

How long does it take for 100 million docs? Do you use Java API or HTTP
API? How many concurrent bulk request do you send? Do you evaluate the
BulkResponses? Do you use monitoring tools? Is the heap full, did you
starve the indexing? Are there GC pauses? Is something in the (debug level)
logs? How large are the segments that you are merging?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG1Q3fEpKrtYrO6%3DbzgU-ux1xeACUK%2Bm0B87s00kqu2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(kondapallinaresh) #3

Hi Jorg

I had not intentionally stopped .It had stopped but didn't logged any
exceptionsi am doing 80 concuent bulk requests each with 10k docs.Bulk
response are fine untli 100 million able to insert all the records with
uuid key.
i am using the default segment merging not configured other than out of
box.Heap is running hot aroud 80% usage but not full.i am not using any
monitoring tools.

On Friday, December 20, 2013 12:04:35 PM UTC-8, Jörg Prante wrote:

Can you tell more details about "the indexing is stopped"? Do you stop it
intentionally?

How long does it take for 100 million docs? Do you use Java API or HTTP
API? How many concurrent bulk request do you send? Do you evaluate the
BulkResponses? Do you use monitoring tools? Is the heap full, did you
starve the indexing? Are there GC pauses? Is something in the (debug level)
logs? How large are the segments that you are merging?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cdcf1961-bf37-4bd6-8d6b-54550d642d10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

So I understand correct, your bulk indexing runs, but suddenly, no more
docs are reaching ES, and the indexer just hangs?

There must be a reason why the hang happened. This effect is known as
"starving".

The default heap GC jumps in at 80% to compensate too long GC full stops.
Maybe you should try to find it if increasing GC is a causing the starving
or not. Sometimes the JVM aborts with diagnostic message like "GC exceeds
overhead limit". This may happen at client side, or at server side too.

For the server, there is bigdesk plugin or jvisualvm (in the JDK) for this
task, and even more monitoring tools for ES are available.

Maybe it is possible to collect more info about what is going on, so the
cause of the trouble can be better identified.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFQkuYaKzKW2ktDA96-1aDEmF-DA_8ZpqyufK2j%3D3uj0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(kondapallinaresh) #5

Thanks Jorn
I will install the monitors/profiles and post the memory detials

On Friday, December 20, 2013 1:42:35 PM UTC-8, Jörg Prante wrote:

So I understand correct, your bulk indexing runs, but suddenly, no more
docs are reaching ES, and the indexer just hangs?

There must be a reason why the hang happened. This effect is known as
"starving".

The default heap GC jumps in at 80% to compensate too long GC full stops.
Maybe you should try to find it if increasing GC is a causing the starving
or not. Sometimes the JVM aborts with diagnostic message like "GC exceeds
overhead limit". This may happen at client side, or at server side too.

For the server, there is bigdesk plugin or jvisualvm (in the JDK) for this
task, and even more monitoring tools for ES are available.

Maybe it is possible to collect more info about what is going on, so the
cause of the trouble can be better identified.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27cc7e0c-5408-4541-b8f6-57d5c79b39cb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6