Elasticsearch/Elastica timeout while indexing

Hello,

I'm using elasticsearch 0.90.2 with Elastica (PHP Client), and I am
experiencing some trouble indexing 10,000,000 documents using
elasticsearch.
I'm indexing using bulk queries only, by batch of 500.
The problem is that elasticsearch stop answering to elastica, every time at
the same stage : (Thu, 15 Aug 2013 14:55:15 BST Exporting results 29000 to
29500 to index), and this of course raises an exception
[Elastica\Exception\ClientException] -> No enabled connection, which is
accurate since as far as I know elastica uses XHTTP API to transfer the
documents (ie no socket or anything).
The funny thing is that if I try on a slower machine I manage to get to ~=
600,000 docs before having the same exception.

I've tried to add a sleep(1); between 2 bulks, which only earned me ~=4,000
additional documents indexed before crashing.
I've tried with all number of nodes, to change the Threadpool settings (in
the index and in the global config file, the doc is not too clear about
that), it changed nothing.

Nothing appear in Elasticsearch logs (it bounces without even noticing),
Elasticsearch never crashes (since it is available again after crashing my
indexing script without restarting or printing anything in the logs).

So right now, I can't even index the whole thing (whatever the time it
takes).

What can I do ?

Oh, and disabling auto-refresh didn't change much either.

Thanks,
Lucas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

I am just guessing here, as there is not a lot of information provided.
First, are you monitoring your system? Can you see what actually happens on
the elasticsearch side. From the outside I suspect, that there is a huge
merge going on, which eats up lots of I/O and CPU and might be a cause for
your system to hang. Merges are background operations, which can be
triggered by indexing lots of data. However, with 0.90 throttling was
introduced, which should prevent this kind of situations (throttles to 20MB
per second by default, something your harddisks should handle just fine -
what do you think?).
Also, if you are telling, that the system is coming back up again without
you doing anything sounds like this mechanism kicks in.

Are you running this on two different systems too make sure, which system
is going down?
Can you check the nodes stats APIs while you are indexing in order to find
out more about the problem? I'd like to know more about this, but this
requires some more information.

Hope this helps as a start to look for problems, otherwise just ask!

---Alex

On Thu, Aug 15, 2013 at 5:11 PM, Lucas Vanryb lucas@shopcade.com wrote:

Hello,

I'm using elasticsearch 0.90.2 with Elastica (PHP Client), and I am
experiencing some trouble indexing 10,000,000 documents using
elasticsearch.
I'm indexing using bulk queries only, by batch of 500.
The problem is that elasticsearch stop answering to elastica, every time
at the same stage : (Thu, 15 Aug 2013 14:55:15 BST Exporting results 29000
to 29500 to index), and this of course raises an exception
[Elastica\Exception\ClientException] -> No enabled connection, which is
accurate since as far as I know elastica uses XHTTP API to transfer the
documents (ie no socket or anything).
The funny thing is that if I try on a slower machine I manage to get to ~=
600,000 docs before having the same exception.

I've tried to add a sleep(1); between 2 bulks, which only earned me
~=4,000 additional documents indexed before crashing.
I've tried with all number of nodes, to change the Threadpool settings (in
the index and in the global config file, the doc is not too clear about
that), it changed nothing.

Nothing appear in Elasticsearch logs (it bounces without even noticing),
Elasticsearch never crashes (since it is available again after crashing my
indexing script without restarting or printing anything in the logs).

So right now, I can't even index the whole thing (whatever the time it
takes).

What can I do ?

Oh, and disabling auto-refresh didn't change much either.

Thanks,
Lucas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.