ES bulk insert time out

Hey, Guys,
I am loading a hive table of around 10million records into ES regularly.
Each document is small with 5-6 attributes. My Es cluster has 7 nodes, each
has 4 core and 128G. ES was allocated with 60% of the memory, and I am
bulking insert (use python client) every 200 entries. My cluster is in
Green status, running version 1.2.1. The index "number_of_shards" : 7,
"number_of_replicas" : 1
But I keep getting read time out exception:

Traceback (most recent call last):
File "reduce_dotcom_browse.test.py", line 95, in
helpers.bulk(es, actions)
File "/usr/lib/python2.6/site-packages/elasticsearch/helpers.py", line
148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/lib/python2.6/site-packages/elasticsearch/helpers.py", line
107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/utils.py",
line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/init.py",
line 568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/lib/python2.6/site-packages/elasticsearch/transport.py", line
274, in perform_request
status, headers, data = connection.perform_request(method, url, params,
body, ignore=ignore)
File
"/usr/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py",
line 51, in perform_request
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError:
ConnectionError(HTTPConnectionPool(host=u'10.93.80.216', port=9200): Read
timed out. (read timeout=10)) caused by:
ReadTimeoutError(HTTPConnectionPool(host=u'10.93.80.216', port=9200): Read
timed out. (read timeout=10))

How can I trouble shoot this? In my opinion, bulk insert 200 entries should
be fairly easy..
Thanks for any pointers.
Chen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/450ae411-586a-431b-b3a9-3767230eaf92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

There are a few things going on here.

When you say 200 entries, is this per second?? It might be that it's
chunking them into 200 docs, but you're really just smashing it with more
than you think.

  • Elasticsearch Platform — Find real-time answers at scale | Elastic
    This doc can show you what the different thread pools are doing. If you
    notice that it's rejecting large amounts of documents you might find you're
    bulk queue is too low. Increasing this might help, but be a little bit
    careful. If you increase it to something too huge you can easily break ES.

First thing is it's always a good idea to not go above 32gb of heap. By
doing so you disable compressed pointers and memory can run away. The file
system cache will happily consume the rest of your memory.

you could also not have any replicas while doing the bulk import and then
increase this after the import has completed. That way you're not writing
out replicas while trying to bulk import. Replicas are only useful for
reads not indexing.

Another important thing is working out your mappings for your index. Are
you enumerating all fields or are there fields that don't require full text
searching etc?

What is the refresh rate on this index? It could be trying to refresh the
index and is getting busy doing this. Although I wouldn't expect a cluster
of this size to be having trouble indexing 200 with this. (if its 200 per
second)

I've also found that running the same number of shards as nodes can have a
bad impact on the cluster as all nodes are busy trying to index and then
they can't perform other cluster functions. - To give you an idea each
shard should be around 20 - 30 gb in size
Try reducing your shard count to 3 or maybe even 2 and then increase
replicas.

I hope this helps.

Cheers,
Rob

On Monday, November 17, 2014 8:04:28 PM UTC+1, Chen Wang wrote:

Hey, Guys,
I am loading a hive table of around 10million records into ES regularly.
Each document is small with 5-6 attributes. My Es cluster has 7 nodes, each
has 4 core and 128G. ES was allocated with 60% of the memory, and I am
bulking insert (use python client) every 200 entries. My cluster is in
Green status, running version 1.2.1. The index "number_of_shards" : 7,
"number_of_replicas" : 1
But I keep getting read time out exception:

Traceback (most recent call last):
File "reduce_dotcom_browse.test.py", line 95, in
helpers.bulk(es, actions)
File "/usr/lib/python2.6/site-packages/elasticsearch/helpers.py", line
148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/lib/python2.6/site-packages/elasticsearch/helpers.py", line
107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/utils.py",
line 70, in _wrapped
return func(*args, params=params, **kwargs)
File
"/usr/lib/python2.6/site-packages/elasticsearch/client/init.py", line
568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/lib/python2.6/site-packages/elasticsearch/transport.py", line
274, in perform_request
status, headers, data = connection.perform_request(method, url,
params, body, ignore=ignore)
File
"/usr/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py",
line 51, in perform_request
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError:
ConnectionError(HTTPConnectionPool(host=u'10.93.80.216', port=9200): Read
timed out. (read timeout=10)) caused by:
ReadTimeoutError(HTTPConnectionPool(host=u'10.93.80.216', port=9200): Read
timed out. (read timeout=10))

How can I trouble shoot this? In my opinion, bulk insert 200 entries
should be fairly easy..
Thanks for any pointers.
Chen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35340585-0675-4f2a-ba6a-95b95bc869db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Robert,
For the comment of 32g heap, I do notice the same recommendation in
official guide. But it don’t seems to make senses to give only 25% ram to
ES as there is total 128g. One alternative could be 2 ES processes per
node, but it would be a bit difficult to manage. Does anyone have similar
setup and would like to share your indexing performance numbers?

Thanks!
Jaguar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ0HBRN6GCYvbZtodp%3DeuW8JRPaXAKf0a52B%2BcLNYLH7eg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jaguar,
This is indeed the case. The OS has it's own file system caches that will
consume this additional memory. ES can be IO intensive when indexing. You
should not let the heap exceed 32gb. Garbage collection will be very slow
and cause pauses to your application. Which version of Java are you using?
1.8 seems to be stable now and I have some success with using G1GC instead
of the default GC mechanism. This wasn't really stable in the old 1.7
releases of Java, so be a little careful with this.

I have gone through a similar process to what you're going through now. We
had 10 nodes each with 128gb of memory. I had huge heaps, things crashed
and timed out. I then reduced the heap to 32gb and put two ES nodes on each
server. Still had some issues with timeouts. I then reduced to one node per
server. It is now stable. There were a few other changes that I made, but
these made some big differences. One thing to note is that every node in
the cluster means that every other node needs to be aware of all it's
actions. when you start having 20 nodes in a cluster you get a very
talkitive network and the nodes are busy just trying to talk to each other.

In the configuration that I have just mentioned to you we bulk index 10k+
docs every second and store over 20TB of searchable data. It's a log
processing system so it's designed for high index volumes. I don't think
it's 100% tuned yet, but it's getting very close.

Let me know if you have any further questions.

On Tuesday, November 18, 2014 7:26:07 PM UTC+1, Jaguar wrote:

Hi Robert,
For the comment of 32g heap, I do notice the same recommendation in
official guide. But it don’t seems to make senses to give only 25% ram to
ES as there is total 128g. One alternative could be 2 ES processes per
node, but it would be a bit difficult to manage. Does anyone have similar
setup and would like to share your indexing performance numbers?

Thanks!
Jaguar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/47453cc3-7862-41dd-97c5-ec299d465a2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Robert,
Thanks for the detailed info. Here is what we have: 10 data only nodes,
192g ram each, 64g given to ES, java6. We have more than 40 indices
actively receiving new data. Max indexing rate for a single index is 3k/s,
total rate us around 50k. We still found timeout occasionally, but it
doesn’t seems GC related (according to GC log info). It seems we have
similar amount of data :). You do raise an interesting point. I could test
a smaller heap to see if I could get better write performance.
One more question, have you got any OOM issue with some huge query?

Thanks!
Jaguar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ323gYKa6SnvvYOHw2cGy2KE7YWmJz1m%2BJOVZ6CSh8ArA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

hmm.. java 6 allocated more than 8GB of heap... you saying you have 64GB
for the es instance?! java 6, iirc still using cms as the gc, when it
exceed >12GB, I started to see strange phenomenon.

Also, what's your segment size?

Check your log if you have the exception pool rejection something, if it is
, just increase the index/bulk queue.

HTH

Jason

On Wed, Nov 19, 2014 at 11:55 PM, 熊贻青 xiong.jaguar@gmail.com wrote:

Hi Robert,
Thanks for the detailed info. Here is what we have: 10 data only nodes,
192g ram each, 64g given to ES, java6. We have more than 40 indices
actively receiving new data. Max indexing rate for a single index is 3k/s,
total rate us around 50k. We still found timeout occasionally, but it
doesn’t seems GC related (according to GC log info). It seems we have
similar amount of data :). You do raise an interesting point. I could test
a smaller heap to see if I could get better write performance.
One more question, have you got any OOM issue with some huge query?

Thanks!
Jaguar

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ323gYKa6SnvvYOHw2cGy2KE7YWmJz1m%2BJOVZ6CSh8ArA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ323gYKa6SnvvYOHw2cGy2KE7YWmJz1m%2BJOVZ6CSh8ArA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itwHM3kNVRpLq_PVFe8rj10VnczrR7bmfanUDeF2Zueugg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Java 6 is not good, can you change that to 7 (or even 8)?

On 20 November 2014 02:55, 熊贻青 xiong.jaguar@gmail.com wrote:

Hi Robert,
Thanks for the detailed info. Here is what we have: 10 data only nodes,
192g ram each, 64g given to ES, java6. We have more than 40 indices
actively receiving new data. Max indexing rate for a single index is 3k/s,
total rate us around 50k. We still found timeout occasionally, but it
doesn’t seems GC related (according to GC log info). It seems we have
similar amount of data :). You do raise an interesting point. I could test
a smaller heap to see if I could get better write performance.
One more question, have you got any OOM issue with some huge query?

Thanks!
Jaguar

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ323gYKa6SnvvYOHw2cGy2KE7YWmJz1m%2BJOVZ6CSh8ArA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ323gYKa6SnvvYOHw2cGy2KE7YWmJz1m%2BJOVZ6CSh8ArA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZkmRU%2Ba%2BLj7KeHTqmb9F4CWnfn1e%2BUHK5CMNzS95gPncw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Java6 is pretty ancient these days. Definitely move to 7 or 8.
Also there have been several improvements with later releases of
elasticsearch also.

If you're hitting OOM problems it possible that one of your caches is
unbound. One that can cause issues is the field data cache, but normally
the query circuit breaker should kick in before oom does.
Have a look at this doc.

It sounds like you have a fairly busy cluster. I would definitely be
looking at refresh interval.
This doc gives you an idea of the impact.

I know how unfun it can be to have an unstable cluster. I hope you can
resolve the issues.

Cheers,
Rob

On Wednesday, November 19, 2014 9:15:33 PM UTC+1, Mark Walkom wrote:

Java 6 is not good, can you change that to 7 (or even 8)?

On 20 November 2014 02:55, 熊贻青 <xiong....@gmail.com <javascript:>> wrote:

Hi Robert,
Thanks for the detailed info. Here is what we have: 10 data only nodes,
192g ram each, 64g given to ES, java6. We have more than 40 indices
actively receiving new data. Max indexing rate for a single index is 3k/s,
total rate us around 50k. We still found timeout occasionally, but it
doesn’t seems GC related (according to GC log info). It seems we have
similar amount of data :). You do raise an interesting point. I could test
a smaller heap to see if I could get better write performance.
One more question, have you got any OOM issue with some huge query?

Thanks!
Jaguar

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ323gYKa6SnvvYOHw2cGy2KE7YWmJz1m%2BJOVZ6CSh8ArA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ323gYKa6SnvvYOHw2cGy2KE7YWmJz1m%2BJOVZ6CSh8ArA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8ce70f9f-d0de-4cc9-b30f-34fbbc6d07f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rob,
Even with 7 shards, each shard has around 100G data. I don't think I can
achieve "each shard should be around 20 - 30 gb in size"
I am using a file for testing, so its actually indexing sequentially for
every 200 entries. When I run cat thread:
curl
'localhost:9200/_cat/thread_pool?v&h=id,host,bulk.active,bulk.rejected,bulk.completed,bulk.queue,bulk.queueSize'
id host bulk.active bulk.rejected bulk.completed bulk.queue
bulk.queueSize
-fmG es-trgt01 0 15901 13024036 0
50
Bp9R es-trgt04 0 41 10806286 0
50
lB0j es-trgt02 0 0 6412 0
50
tW2Z es-trgt05 0 4 11000638 0
50
_qPw es-trgt06 4 0 8286 25
50
csxB es-trgt03 0 0 8314 0
50
ah7F es-trgt00 0 2200 9978972 0
50

It does show large amount of rejections, but none of the queue reaches its
queue size (50). Why would indexing fail in such case?

Another thing worthy of mention is that the documents i am indexing are
child documents. Does this affect the bulk behavior at all?

I am going to lower the heap size to see whether it helps.

Thanks,
Chen
On Monday, November 17, 2014 10:55:03 PM UTC-8, Robert Gardam wrote:

There are a few things going on here.

When you say 200 entries, is this per second?? It might be that it's
chunking them into 200 docs, but you're really just smashing it with more
than you think. -
Elasticsearch Platform — Find real-time answers at scale | Elastic
This doc can show you what the different thread pools are doing. If you
notice that it's rejecting large amounts of documents you might find you're
bulk queue is too low. Increasing this might help, but be a little bit
careful. If you increase it to something too huge you can easily break ES.

First thing is it's always a good idea to not go above 32gb of heap. By
doing so you disable compressed pointers and memory can run away. The file
system cache will happily consume the rest of your memory.

you could also not have any replicas while doing the bulk import and then
increase this after the import has completed. That way you're not writing
out replicas while trying to bulk import. Replicas are only useful for
reads not indexing.

Another important thing is working out your mappings for your index. Are
you enumerating all fields or are there fields that don't require full text
searching etc?

What is the refresh rate on this index? It could be trying to refresh the
index and is getting busy doing this. Although I wouldn't expect a cluster
of this size to be having trouble indexing 200 with this. (if its 200 per
second)

I've also found that running the same number of shards as nodes can have a
bad impact on the cluster as all nodes are busy trying to index and then
they can't perform other cluster functions. - To give you an idea each
shard should be around 20 - 30 gb in size
Try reducing your shard count to 3 or maybe even 2 and then increase
replicas.

I hope this helps.

Cheers,
Rob

On Monday, November 17, 2014 8:04:28 PM UTC+1, Chen Wang wrote:

Hey, Guys,
I am loading a hive table of around 10million records into ES regularly.
Each document is small with 5-6 attributes. My Es cluster has 7 nodes, each
has 4 core and 128G. ES was allocated with 60% of the memory, and I am
bulking insert (use python client) every 200 entries. My cluster is in
Green status, running version 1.2.1. The index "number_of_shards" : 7,
"number_of_replicas" : 1
But I keep getting read time out exception:

Traceback (most recent call last):
File "reduce_dotcom_browse.test.py", line 95, in
helpers.bulk(es, actions)
File "/usr/lib/python2.6/site-packages/elasticsearch/helpers.py", line
148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/lib/python2.6/site-packages/elasticsearch/helpers.py", line
107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/utils.py",
line 70, in _wrapped
return func(*args, params=params, **kwargs)
File
"/usr/lib/python2.6/site-packages/elasticsearch/client/init.py", line
568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/lib/python2.6/site-packages/elasticsearch/transport.py",
line 274, in perform_request
status, headers, data = connection.perform_request(method, url,
params, body, ignore=ignore)
File
"/usr/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py",
line 51, in perform_request
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError:
ConnectionError(HTTPConnectionPool(host=u'10.93.80.216', port=9200): Read
timed out. (read timeout=10)) caused by:
ReadTimeoutError(HTTPConnectionPool(host=u'10.93.80.216', port=9200): Read
timed out. (read timeout=10))

How can I trouble shoot this? In my opinion, bulk insert 200 entries
should be fairly easy..
Thanks for any pointers.
Chen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd19a9b0-6054-4cdf-ba93-536b874d92fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.