Discuss the Elastic Stack

Curl timeout during bulk insert

Elastic Stack Elasticsearch

Bastiaan_Zijlema (Bastiaan Zijlema) March 28, 2013, 12:54pm 1

Hello everybody,

We have an ES index with around 1 billion documents. The index is
constantly updated and expanded by 16 spiders. The spiders send bulk
inserts with curl (http api), varying between 1 and 10,000 documents,
depending on the data found by the spiders. Everything is running fine, the
inserts take a few seconds at most. However, once in a while the curl
request will time out after 2 minutes.

I'm trying to find out why this is happening, but I cannot find anything
useful in the logs. I only see some warnings from the garbage collector,
but not at the times the timeouts happen. So I have two questions:

how can I find out why the timeouts are happening (log settings)?
Is this a bad thing, or should I just accept this will happen and
increase the curl timeout settings?

Our setup:

4 nodes
4 cores per node
24GB Ram, 16GB heap size per node
Java 1.7
ES 0.20.5

ES settings:

12 shards
1 replica

Thanks in advance.

Regards,

Bastiaan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

drewr (Drew Raines) March 28, 2013, 4:39pm 2

Bastiaan Zijlema wrote:

We have an ES index with around 1 billion documents. The index
is constantly updated and expanded by 16 spiders. The spiders
send bulk inserts with curl (http api), varying between 1 and
10,000 documents, depending on the data found by the
spiders. Everything is running fine, the inserts take a few
seconds at most. However, once in a while the curl request will
time out after 2 minutes.

Which bulk requests are timing out? Is it just large ones? How
many bytes are the requests that time out?

-Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bastiaan_Zijlema (Bastiaan Zijlema) March 29, 2013, 8:43am 3

It really differs. Sometimes it are small ones, sometimes large ones up to
700KB in size. But when I try to reproduce the timeouts by sending large
bulks (around 24MB) everything seems to be running fine.

Bastiaan

Op donderdag 28 maart 2013 13:54:51 UTC+1 schreef Bastiaan Zijlema het
volgende:

Hello everybody,

We have an ES index with around 1 billion documents. The index is
constantly updated and expanded by 16 spiders. The spiders send bulk
inserts with curl (http api), varying between 1 and 10,000 documents,
depending on the data found by the spiders. Everything is running fine, the
inserts take a few seconds at most. However, once in a while the curl
request will time out after 2 minutes.

I'm trying to find out why this is happening, but I cannot find anything
useful in the logs. I only see some warnings from the garbage collector,
but not at the times the timeouts happen. So I have two questions:

how can I find out why the timeouts are happening (log settings)?

Is this a bad thing, or should I just accept this will happen and
increase the curl timeout settings?

Our setup:

4 nodes

4 cores per node

24GB Ram, 16GB heap size per node

Java 1.7

ES 0.20.5

ES settings:

12 shards

1 replica

Thanks in advance.

Regards,

Bastiaan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante (Jörg Prante) March 29, 2013, 11:50am 4

Take care your spiders don't hit the 100 MB HTTP limit which is set in
http.max_content_length and is 100m by default.

Jörg

Am 29.03.13 09:43, schrieb Bastiaan Zijlema:

It really differs. Sometimes it are small ones, sometimes large ones
up to 700KB in size. But when I try to reproduce the timeouts by
sending large bulks (around 24MB) everything seems to be running fine.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bastiaan_Zijlema (Bastiaan Zijlema) March 29, 2013, 12:23pm 5

Thanks for pointing this out, I did'nt know about this. But when I try to
send more than 100MB I get the error "Recv failure: Connection reset by
peer when sending data", not a timeout. So this doesn't seem to be the
problem causing the timeouts.

Bastiaan

On Friday, March 29, 2013 12:50:40 PM UTC+1, Jörg Prante wrote:

Take care your spiders don't hit the 100 MB HTTP limit which is set in
http.max_content_length and is 100m by default.

Jörg

Am 29.03.13 09:43, schrieb Bastiaan Zijlema:

It really differs. Sometimes it are small ones, sometimes large ones
up to 700KB in size. But when I try to reproduce the timeouts by
sending large bulks (around 24MB) everything seems to be running fine.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views	Activity
Bulk update times out Elasticsearch	3	321	July 6, 2017
ES bulk insert time out Elasticsearch	9	12425	July 6, 2017
Bulk Indexing performance questions Elasticsearch	8	3263	August 8, 2017
Timeout when ES is dead Elasticsearch	2	759	January 12, 2018
Timeout Elasticsearch	4	900	July 6, 2017

© 2020. All Rights Reserved - Elasticsearch

Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries
Trademarks
Terms
Privacy
Brand
Code of Conduct

Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.