Slow large document insertion

Hello,

I'm using elastic search to index application logs that can balloon to be
rather large. I currently roll the indexes every 24 hours and I keep 14
indexes (days) worth of data around.

I wanted to see if anyone could offer up any suggestions on how to speed up
the bulk insertion?

Is it better to do many small bulk insert requests? (I know there is a
network overhead with many small operations, but considering I can fire
these requests off in parallel I don't think it's that big a deal)
or
Is it better to do a few very large bulk insert requests?

Right now I batch out to ES at 500 documents, which I have seen to get as
large as 150MB/batch and take ~30s to process

What knobs can I tweak to help speed up the document insertion? I assume
things slow down because I'm sending large documents...unfortunately there
isn't really a clean way to break these up (there is really no clean way to
nest the document)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Robert,

On Mon, Jun 17, 2013 at 7:32 PM, Robert Navarro crshman@gmail.com wrote:

Is it better to do many small bulk insert requests? (I know there is a
network overhead with many small operations, but considering I can fire
these requests off in parallel I don't think it's that big a deal)
or
Is it better to do a few very large bulk insert requests?

There is no general answer, as this depends on a lot of factors. You should
perform tests by starting with small batches (a couple of documents) and
progressively increase the size of the batches. Indexing speed should
increase slightly. As soon as increasing the size of the batches doesn't
improve indexing speed, or even makes it worse, you just found the optimal
batch size.

Right now I batch out to ES at 500 documents, which I have seen to get as
large as 150MB/batch and take ~30s to process

What knobs can I tweak to help speed up the document insertion? I assume
things slow down because I'm sending large documents...unfortunately there
isn't really a clean way to break these up (there is really no clean way to
nest the document)

150MB per batch looks a lot to me, you should maybe experiment with smaller
batches. If indexing speed matters more than searching speed to you, you
could try to decrease index.refresh_interval and/or disable merge
throttling (indices.store.throttle.type = "none").

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.