FYI, figured out how to avoid this. Since the shard recovery from the
fs is so fast when there have been no index updates, all shards are up
nearly instantly with empty caches.
To work around this contention, I have:
- as suggested, throttled the concurrent_recoveries to 1
- remove the work directory before node start, forcing much slower
recovery from the gateway
- clear the system disk cache (ymmv on this one, but seemed to help
When I do this performance never drops below the single node steady
This got me thinking to an alternative approach that may handle this
and many other cluster conditions.
If there was feedback back into the shard router from response times,
cpu load, etc it would be able route traffic around shards with
warming caches, nodes with failing h/w, out of mem conditions, and
properly distribute traffic to slower vs faster h/w in the cluster.
Right now, in a smaller cluster (and probably a bigger), one slow node
can drag down performance disproportionately.
Thanks again for all the help.
On Sep 1, 2:13 pm, Paul ppea...@gmail.com wrote:
It all depends on the search requirements, SLAs, etc...
The tests I am running at the moment are synthetic in that they don't
simulate realworld traffic, and instead are going for max throughput.
I should have the infrastructure in place over the next couple of
weeks to replay traffic through ES at real world rates (or multiples
thereof) w/ the burst patterns associated with that and will no for
sure if this will be a real issue for me.
If it is still an issue for me, it sounds like I may need to have an
extra server in the cluster to account for this case, but we will
On Sep 1, 1:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
No, the field data cache is unique to the JVM and not really serializable. I
can try and check if its possible to minimize the time mentioned, but won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is done?
On Wed, Sep 1, 2010 at 10:38 PM, Paul ppea...@gmail.com wrote:
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.
The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.
When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?
Thanks again for the help,
On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Is the 10 second period of QPS degradation is critical? Another solution
be to have more machines to even the load and reduce the effect of
node joining the cluster.
Doing a pre warming phase is not simple at all (though it certainly
like it ). The reason is that I can add pre warming on a shard level,
you end up having 75 shards allocate on that node. Each shard is on its
they are not interconnected, or have dependencies one or the other (as it
very very complicate to do in distributed fashion). So, the first shard
warm before it is available, but then, the others warming up might cause
queries directed to the already started shard to slow down.
On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:
Yeah, doing heavy filtering and sorting based on date.
Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.
I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.
Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.
My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.
From an ES perspective, I see a few possible solutions:
- Do a configurable internal warm up of the node. Could either
dispatch live queries to the node and disregard the response or store
the last N queries to re-run
- Figure out how to distribute the caches from the active nodes. This
would be a really cool approach, but I don't know about the
I am open to any other suggestions that I can do on my side.
On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Especially with this many shards... .
On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
Yes, this might be that. If you use facets / sorting / filters then
cache will require some time to fill up.
On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:
Here is an interesting observation... Once a node is warmed up,
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage
about the 9GB mark.
This implies to me that the degradation is occurring due to ES
to build up the filter cache and the data cache.
Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.
On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:
I played around with that setting trying 1 (which I actually
gets set to 3 in the code) and 10. Both settings demonstrate the
performance impact, as the default, which should be 24 (cores)
Also, played around with in memory indexes in this situation.
the same issue occurs, but I'm not 100% confident in my results,
started swapping due to low memory.
On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
If you are using latest elasticsearch (0.10) then the setting
routing.allocation.concurrent_recoveries. Which version are
On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com
Thanks for the details. I tried adding the following into
file and restarting the cluster:
concurrent_recoveries : 1
I also tried this setting that I found mentioned elsewhere:
concurrent_recoveries : 1
Are either of these correct? For the first setting,
appeared as bad and with the second, worse.
Also, keep in mind, the actual recovery should be pretty
as I am used fs based local storage and not indexing during
case, so the local working data is all good when the node
I'm going to be playing around with in memory indexes this
to see if behavior is similar.
On Sep 1, 6:49 am, Shay Banon <shay.ba...@elasticsearch.com
This basically happens because of the large amount of
have on a single node. When you start a second node, the
be allocated to it, and a recover process will start. I
recovery process takes its time, as well as the creation
indices and shards.
It is throttled, by the way (the recovery process), but
of concurrent recoveries allowed on a node is based on the
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and
value. Note though, that changing this setting only
So you will need to bring both nodes down to change the
On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
Was curious if there were any thoughts on a performance
Running two nodes mirrored (ie, replicas=1 for all
indexes, 75 shards, ~30gb of data, fs storage, with no
occurring and 64 threads running test queries. In a
get ~600 queries per second.
I remove one node and as expected throughput drops to a
read more »