Timeout during bulk indexing when doc size increased


(dragan) #1

Hi,

I have the following setup:

Two ES nodes on ec2 (fedora) with 5 shards 1 replica each, default
elasticsearch.conf, a separate ec2 instance (ubuntu 10.10) that
indexes documents to ES via pyes 0.16

I'm indexing documents in batches of less than 5 thousand docs.

If each document is less than 2Kb, everything runs smoothly.
If I populate a string field that has a paragraph of text in it,
bringing the size of a doc to ~4Kb, the insertion process times out.

The timeout I set in pyes when I create the connection object is 2
minutes, my input files are getting larger and I will need to be able
to push more data per document into ES. I'm wondering what is causing
the timeout and what I need to change in the config in order to push
into the ES reliably.

If you can share some rules of thumb, I'd greatly appreciate that
Thank you


(Shay Banon) #2

First, depending on your instance type, I would suggest changing the default memory allocation to explicitly set the memory set for the ES (java) process.

I don't know what times out and why, did you see any failures in elasticsearch logs? If you increase the time out value, does it work?

On Thursday, February 16, 2012 at 8:33 PM, Dragan wrote:

Hi,

I have the following setup:

Two ES nodes on ec2 (fedora) with 5 shards 1 replica each, default
elasticsearch.conf, a separate ec2 instance (ubuntu 10.10) that
indexes documents to ES via pyes 0.16

I'm indexing documents in batches of less than 5 thousand docs.

If each document is less than 2Kb, everything runs smoothly.
If I populate a string field that has a paragraph of text in it,
bringing the size of a doc to ~4Kb, the insertion process times out.

The timeout I set in pyes when I create the connection object is 2
minutes, my input files are getting larger and I will need to be able
to push more data per document into ES. I'm wondering what is causing
the timeout and what I need to change in the config in order to push
into the ES reliably.

If you can share some rules of thumb, I'd greatly appreciate that
Thank you


(system) #3