Curl timeout during bulk insert

Hello everybody,

We have an ES index with around 1 billion documents. The index is
constantly updated and expanded by 16 spiders. The spiders send bulk
inserts with curl (http api), varying between 1 and 10,000 documents,
depending on the data found by the spiders. Everything is running fine, the
inserts take a few seconds at most. However, once in a while the curl
request will time out after 2 minutes.

I'm trying to find out why this is happening, but I cannot find anything
useful in the logs. I only see some warnings from the garbage collector,
but not at the times the timeouts happen. So I have two questions:

  • how can I find out why the timeouts are happening (log settings)?
  • Is this a bad thing, or should I just accept this will happen and
    increase the curl timeout settings?

Our setup:

  • 4 nodes
  • 4 cores per node
  • 24GB Ram, 16GB heap size per node
  • Java 1.7
  • ES 0.20.5

ES settings:

  • 12 shards
  • 1 replica

Thanks in advance.

Regards,

Bastiaan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bastiaan Zijlema wrote:

We have an ES index with around 1 billion documents. The index
is constantly updated and expanded by 16 spiders. The spiders
send bulk inserts with curl (http api), varying between 1 and
10,000 documents, depending on the data found by the
spiders. Everything is running fine, the inserts take a few
seconds at most. However, once in a while the curl request will
time out after 2 minutes.

Which bulk requests are timing out? Is it just large ones? How
many bytes are the requests that time out?

-Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It really differs. Sometimes it are small ones, sometimes large ones up to
700KB in size. But when I try to reproduce the timeouts by sending large
bulks (around 24MB) everything seems to be running fine.

Bastiaan

Op donderdag 28 maart 2013 13:54:51 UTC+1 schreef Bastiaan Zijlema het
volgende:

Hello everybody,

We have an ES index with around 1 billion documents. The index is
constantly updated and expanded by 16 spiders. The spiders send bulk
inserts with curl (http api), varying between 1 and 10,000 documents,
depending on the data found by the spiders. Everything is running fine, the
inserts take a few seconds at most. However, once in a while the curl
request will time out after 2 minutes.

I'm trying to find out why this is happening, but I cannot find anything
useful in the logs. I only see some warnings from the garbage collector,
but not at the times the timeouts happen. So I have two questions:

  • how can I find out why the timeouts are happening (log settings)?
  • Is this a bad thing, or should I just accept this will happen and
    increase the curl timeout settings?

Our setup:

  • 4 nodes
  • 4 cores per node
  • 24GB Ram, 16GB heap size per node
  • Java 1.7
  • ES 0.20.5

ES settings:

  • 12 shards
  • 1 replica

Thanks in advance.

Regards,

Bastiaan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Take care your spiders don't hit the 100 MB HTTP limit which is set in
http.max_content_length and is 100m by default.

Jörg

Am 29.03.13 09:43, schrieb Bastiaan Zijlema:

It really differs. Sometimes it are small ones, sometimes large ones
up to 700KB in size. But when I try to reproduce the timeouts by
sending large bulks (around 24MB) everything seems to be running fine.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for pointing this out, I did'nt know about this. But when I try to
send more than 100MB I get the error "Recv failure: Connection reset by
peer when sending data", not a timeout. So this doesn't seem to be the
problem causing the timeouts.

Bastiaan

On Friday, March 29, 2013 12:50:40 PM UTC+1, Jörg Prante wrote:

Take care your spiders don't hit the 100 MB HTTP limit which is set in
http.max_content_length and is 100m by default.

Jörg

Am 29.03.13 09:43, schrieb Bastiaan Zijlema:

It really differs. Sometimes it are small ones, sometimes large ones
up to 700KB in size. But when I try to reproduce the timeouts by
sending large bulks (around 24MB) everything seems to be running fine.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.