Sweet spot for bulk indexing

Hi there,

For my use case I will be bulk indexing 24/7. Average size of a document is
somewhere between 500-1000 bytes. Is there a certain sweet spot for bulk
indexing? Currently I use a bucket size of 10 events.

I could imagine I should go for a benchmark, but maybe there's someone out
here with a better supported opinion.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Do you use the Java bulk indexing or HTTP bulk indexing?

It's hard to give info just for a known document size. Very much depends
on your hardware, and even more on software, most important multithreading.

Have you complex analyzers? How fast can your indexer build the
documents? This will be CPU intensive tasks.

I have now some hardware available for testing and I started some
experiments with bulk indexing recently. An average desktop PC can index
~10.000 docs of the stated size using the Java API, standard settings,
single node, with an indexer that can use all CPU cores. But a typical
OOTB bulk indexing setup is around 1000 docs for a single thread on a
current CPU/disk system.

You can find the sweet spot of your system by using bulk index
throttling. Define a bulk action size and a maximum number of
concurrency and, if exceeded, wait for open bulk requests until they are
completed. If the bulk indexing runs smoothly without delays caused by
CPU or IO you're fine.

Best regards,

Jörg

Am 31.01.13 22:44, schrieb Robin Verlangen:

Hi there,

For my use case I will be bulk indexing 24/7. Average size of a
document is somewhere between 500-1000 bytes. Is there a certain sweet
spot for bulk indexing? Currently I use a bucket size of 10 events.

I could imagine I should go for a benchmark, but maybe there's someone
out here with a better supported opinion.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

Do you use the Java bulk indexing or HTTP bulk indexing?
Java bulk

It's hard to give info just for a known document size. Very much depends
on your hardware, and even more on software, most important multithreading.

Currently we use "CPU_CORES*1" amount of indexing threads that all have a
bulk size of (currently) 10.

Have you complex analyzers? How fast can your indexer build the documents?
This will be CPU intensive tasks.

Uses the default tokenizer. The indexers don't seem to be a bottleneck,
they are capable of sending through 1 million plus events per minute.

The rest seems to make sense. I'll just go with the bench marking, that
would probably the best way. I then could use my results to automatically
throttle the indexing.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Fri, Feb 1, 2013 at 1:32 AM, Jörg Prante joergprante@gmail.com wrote:

Do you use the Java bulk indexing or HTTP bulk indexing?

It's hard to give info just for a known document size. Very much depends
on your hardware, and even more on software, most important multithreading.

Have you complex analyzers? How fast can your indexer build the documents?
This will be CPU intensive tasks.

I have now some hardware available for testing and I started some
experiments with bulk indexing recently. An average desktop PC can index
~10.000 docs of the stated size using the Java API, standard settings,
single node, with an indexer that can use all CPU cores. But a typical OOTB
bulk indexing setup is around 1000 docs for a single thread on a current
CPU/disk system.

You can find the sweet spot of your system by using bulk index throttling.
Define a bulk action size and a maximum number of concurrency and, if
exceeded, wait for open bulk requests until they are completed. If the bulk
indexing runs smoothly without delays caused by CPU or IO you're fine.

Best regards,

Jörg

Am 31.01.13 22:44, schrieb Robin Verlangen:

Hi there,

For my use case I will be bulk indexing 24/7. Average size of a document
is somewhere between 500-1000 bytes. Is there a certain sweet spot for bulk
indexing? Currently I use a bucket size of 10 events.

I could imagine I should go for a benchmark, but maybe there's someone
out here with a better supported opinion.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.