Hi,
These are all good questions. Let me answer the simple ones first:
-
Controlling the number of recoveries are done by setting
"cluster.routing.allocation.concurrent_recoveries". It defaults to the
number of processors + 1 (though its usually IO intensive, had to come out
with something...). This controls the number of concurrent recoveries
allowed on a specific node. Note, this is a 0.11 setting (to be released in
a day or so).
-
I am not sure about the clearing the number of work dir. It will increase
the time it takes to recover (as local data will not be reused) and of
course should not be done when using upcoming local gateway in 0.11. I would
say you usually don't want to do it.
Number of shards and number of replicas is a tough one to answer, since it
really depends on the system. Basically, more shards means faster indexing
(assuming you have enough machines to utilize the fact that there are more
shards), and a greater upper limit for the index size. Basically, if you
have a 10 shards index, it can only grow up to 10 machines x disk size for
the index size.
Replicas are there to both improve high availability and search performance.
Its easy not to worry about them that much as there is an API to increase /
decrease it on the fly.
Back to the number of shards setting. I would like to explain why one can't
change it. The cost of doing some sort of "repartitioning" of the data in an
index is very high. If just the recover can be taxing (though it should be
fast, compressed, and in 0.11 even faster), imagine how taxing it can be to
have to reindex part of the data to create a new partitioning scheme
(increase the number of shards). As well as potentially create a long pause
where the index is useless.
For this reason, I went with fixed number of shards, but tried to solve it
in a different manner. You can always create more indices. And to complete
the picture, you can always search on more than one index.
So, how to set the number of shards? I would say first run a test that
loads, lets say, 10% of the expected data size. You can get a feeling for
the index size required (there is an API for that, the indices status). From
that you can extrapolate both what you need now, and what you will need when
you grow xN times.
It gets a bit more interesting for really unbounded systems, like indexing
log files. For that, you can create an index per week of logs. Each start of
a week, you start a new index and index data into it. This gives you full
scale out model and you can always search on a month by simply searching on
the last 4 weeks (for example).
This allows you also flexibility, for example, the last week can be really
hot when it comes to searching on, so it can have more replicas. Older weeks
can have less replicas as they will be searched on less. Another nice aspect
of this design is the fact that deleted old "weeks" is a snap, its just a
matter of deleting that index and not deleting all the log entries
associated with that week within a single index.
-shay.banon
On Tue, Sep 28, 2010 at 7:55 AM, Paul Smith tallpsmith@gmail.com wrote:
We are mitigating this by throttling
shard recovery to one at a time and clearing the work directory on a node
on start up, to ensure a slow recovery.
ooh, it's little tidbits like this that are really useful. How do you do
the shard recovery one at a time though..? I'm presuming you mean by
clearing out the work directory, you're for all intents and purposes telling
ES that this is a 'brand new fresh' node, and not trying to do
any reconciliation with the on disk shard info with the replicas being
taken.
That is, to rephrase, instead of trying to recover a dead node, basically
'putting a bullet in it' and treating it like a fresh node?
Paul