EsRejectedExecutionException when searching date based indices


(Alex Clark-2) #1

Hello all, I’m getting failed nodes when running searches and I’m hoping
someone can point me in the right direction. I have indices created per
day to store messages. The pattern is pretty straight forward: the index
for January 1 is "messages_20140101", for January 2 is "messages_20140102"
and so on. Each index is created against a template that specifies 20
shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have
recently upgraded to ES 1.0.

When I search for all messages in a year (either using an alias or
specifying “messages_2013*”), I get many failed nodes. The reason given
is: “EsRejectedExecutionException[rejected execution (queue capacity 1000)
on
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”).
The more often I search, the fewer failed nodes I get (probably caching in
ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so
the document counts coming back have to be accurate. The aggregate counts
will change depending on the number of node failures. We use the Java API
to create a local node to index and search the documents. However, we also
see the issue if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s
under 1000 nodes so as expected). However, it is a pretty common use case
for our customers to search messages spanning an entire year. Any
suggestions on how I can prevent these failures?

Thank you for your help!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

You are mixing nodes and shards, right?
How many elasticsearch nodes do you have to manage your 7300 shards?
Why did you set 20 shards per index?

You can increase the queue size in elasticsearch.yml but I'm not sure it's the right thing to do here.

My 2 cents

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 févr. 2014 à 01:36, Alex Clark alex@bitstew.com a écrit :

Hello all, I’m getting failed nodes when running searches and I’m hoping someone can point me in the right direction. I have indices created per day to store messages. The pattern is pretty straight forward: the index for January 1 is "messages_20140101", for January 2 is "messages_20140102" and so on. Each index is created against a template that specifies 20 shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have recently upgraded to ES 1.0.

When I search for all messages in a year (either using an alias or specifying “messages_2013*”), I get many failed nodes. The reason given is: “EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”). The more often I search, the fewer failed nodes I get (probably caching in ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so the document counts coming back have to be accurate. The aggregate counts will change depending on the number of node failures. We use the Java API to create a local node to index and search the documents. However, we also see the issue if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s under 1000 nodes so as expected). However, it is a pretty common use case for our customers to search messages spanning an entire year. Any suggestions on how I can prevent these failures?

Thank you for your help!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AD2469D8-4910-4166-91BA-D98D67860BAC%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(Alex Clark-2) #3

That is correct, I was mixing the terms "nodes" and "shards" (sorry about
that). I'm running the test on a single node (machine). I've chosen 20
shards so we could eventually go to a 20 server cluster without
re-indexing. It's unlikely we'll ever need to go that high but we never
know and given we receive 750 million messages a day, the thought of
reindexing after collecting a years worth of data makes me nervous. If I
can "over shard" and avoid a massive reindex then I'll be a happy guy.

I thought about reducing the 20 shards but even if I go to say 5 shards on
5 machines (1 shard per machine?) then I'll still run into the issue if a
user searches several years back. Any other thoughts on a possible
solution? Would increasing the queue size be a good option. Is there a
down side (performance hit, running out of resources, etc)?

Thanks again!

On Tuesday, February 25, 2014 11:32:26 PM UTC-8, David Pilato wrote:

You are mixing nodes and shards, right?
How many elasticsearch nodes do you have to manage your 7300 shards?
Why did you set 20 shards per index?

You can increase the queue size in elasticsearch.yml but I'm not sure it's
the right thing to do here.

My 2 cents

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 févr. 2014 à 01:36, Alex Clark <al...@bitstew.com <javascript:>> a
écrit :

Hello all, I’m getting failed nodes when running searches and I’m hoping
someone can point me in the right direction. I have indices created per
day to store messages. The pattern is pretty straight forward: the index
for January 1 is "messages_20140101", for January 2 is "messages_20140102"
and so on. Each index is created against a template that specifies 20
shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have
recently upgraded to ES 1.0.

When I search for all messages in a year (either using an alias or
specifying “messages_2013*”), I get many failed nodes. The reason given
is: “EsRejectedExecutionException[rejected execution (queue capacity
1000) on
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924<javascript:>]”).
The more often I search, the fewer failed nodes I get (probably caching in
ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so
the document counts coming back have to be accurate. The aggregate counts
will change depending on the number of node failures. We use the Java API
to create a local node to index and search the documents. However, we also
see the issue if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s
under 1000 nodes so as expected). However, it is a pretty common use case
for our customers to search messages spanning an entire year. Any
suggestions on how I can prevent these failures?

Thank you for your help!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/954f7266-6587-4509-8159-aae5897dc2b6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

I think you have a misconception about shard over-allocation and
re-indexing, so you should read

https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

where kimchy explains how over-allocation of shards work.

If you have time-series indexes, you need not 20 shards per day, just in
fear to be able to stretch out to 20 nodes in the future. That is only true
for single, static, non-time-series indexes. With index aliasing and
routing applied to time-series data, 1 shard (+1 replica) per day might be
enough (maybe some more like 2 or 3, or more replica, it depends on
balancing out indexing and search load). For a year with a shard per day,
you will end up in 365 shards plus 365 replica shards which is quite a
handful, and in theory enough to distribute over 365 nodes. If shards start
to get tight on resources, use index aliasing and routing. Or just add
nodes, and ES will automatically redistribute the existing shards to become
happy again. No re-indexing at all.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHoPHqx6GZoPw2QFqjRhc%2BS0AX93fe1WBuwFp_0ZA08NQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alex Clark-2) #5

Thank you for the link, it's very helpful. The reason I chose 20 per daily
index was because each day would hold around 750 million documents (each
with just under 1000 fields). This seemed like a fairly high data
requirement that would require many nodes.

If I only have one shard and one replica, then I'll have 365 x 2 = 720
total shards per year. If I run them on a 10 node cluster, then will the
shards be allocated evenly (72 shards per node) even though it is really 2
shards per index per node (and 365 indices)? If I then need to grow the
cluster to 20 servers, will the collection automatically re-balance in a
reasonable time? That's a lot of data for the cluster to move! My main
goal is to be able to add hardware to the cluster if needed without
re-indexing 750M x 365 = 273,750,000,000 documents (each with 1000 fields)
since this could take a considerably long time to do. Also, is it
reasonable to expect high performance out of a single shard index with 750M
records each with 1000 fields?

Finally, just as a data point, we're really indexing 750M records x 365
days a year x 7 years which gives 1,916,250,000,000 documents for the ES
cluster to chew on. It'll definitely be a good test of the technology and
interesting to see how the performance holds! It's maybe even a good
customer success story to put on the elasticsearch website if all goes
well. :wink:

On Wednesday, February 26, 2014 9:13:02 AM UTC-8, Jörg Prante wrote:

I think you have a misconception about shard over-allocation and
re-indexing, so you should read

https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

where kimchy explains how over-allocation of shards work.

If you have time-series indexes, you need not 20 shards per day, just in
fear to be able to stretch out to 20 nodes in the future. That is only true
for single, static, non-time-series indexes. With index aliasing and
routing applied to time-series data, 1 shard (+1 replica) per day might be
enough (maybe some more like 2 or 3, or more replica, it depends on
balancing out indexing and search load). For a year with a shard per day,
you will end up in 365 shards plus 365 replica shards which is quite a
handful, and in theory enough to distribute over 365 nodes. If shards start
to get tight on resources, use index aliasing and routing. Or just add
nodes, and ES will automatically redistribute the existing shards to become
happy again. No re-indexing at all.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8edd9cfe-2856-4dcf-9ffb-7a5833b80fcb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly-2) #6

Would love to hear a success story anytime. :slight_smile:

On Wednesday, February 26, 2014 4:58:03 PM UTC-5, Alex Clark wrote:

Finally, just as a data point, we're really indexing 750M records x 365
days a year x 7 years which gives 1,916,250,000,000 documents for the ES
cluster to chew on. It'll definitely be a good test of the technology and
interesting to see how the performance holds! It's maybe even a good
customer success story to put on the elasticsearch website if all goes
well. :wink:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/21ccd023-836e-478e-be0f-bd952f221a73%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7