Thanks for the clarification.
Hi,
No, that index will remain in status red, and there is no way to tell the
cluster that the index will no longer have this shard available. If you do
manage to bring back the machine where that shard is stored, then it will be
recovered.
-shay.banon
On Tuesday, April 19, 2011 at 3:34 PM, Michel Conrad wrote:
Hi Shay,
thank you for your answers. The only thing that is not clear to me is
in case that I have an index
without replicas and I lose a shard, the state will go from green to
red, as yellow cannot happen
when there are no replicas, correct? Is it possible in that case to
tell the index that there will be
one shard less from now one, that the state will be green again. I am
asking because at one point
I am waiting for the cluster state to become yellow, and with an index
on red this will not be possible.
Best,
Michel
On Tue, Apr 19, 2011 at 1:15 PM, Shay Banon
shay.banon@elasticsearch.com wrote:
Yes, you can change the number of replicas dynamically, check teh update
settings API. More answers below:
On Tuesday, April 19, 2011 at 2:11 PM, Michel Conrad wrote:
Hi,
is it possible to change the number of replicas dynamically? Lets say
I index my data into different indices. After every day I switch to a
new index.
If I know from my usage scenario, that the most recent data is the
data my users search for mostly and the older data is needed less
often, would it be a good idea to change
the replica settings on the fly, for instance start with 2 replicas
during the first week, then down to 1 replica for the second week, and
to no replica at all for data older than a month.
Note, if you have no replicas, and you loose a node that had a shard for the
index with no replicas, you will loose that data.
There are some open questions where I didn't find an answer for:
- If I have no replica and I lose a node, what will happen to the
cluster? Will queriing still work (without the lost data)?
Querying will still work on whatever data is available, but, that relevant
data is lost.
- If I start with 2 replicas and I go down to 1, will the data still
stay on the hdd on the third node, so that I can recover it in a case
off an HDD failure?
If you reduce to less replicas, then it won't exist on disk. You will now
have 2 copies of the data instead of 3.
- In order to speed up queriing, what are the prerequesites to use an
im memory index? Is it possible to keep a backup of the data on HDD
and use an inmemory index without replica?
I suggest first to check the file based index, with file system cache, it
will be pretty fast.
Regards,
Michel
On Tue, Apr 19, 2011 at 11:14 AM, Kristian Jörg krjg@devo.se wrote:
Hi,
thanks for all information everybody!
I am now getting the grasp on this and feel confident I can manage the
configuration from here on.
/Kristian
Berkay Mollamustafaoglu skrev 2011-04-15 14:24:
Shard/replica configuration depends on your needs. There are many factors,
how many docs, users, number of queries, etc. 50K documents is not much so
you're probably OK with one server unless you have very high number of
users/queries.
Replica is for redundancy but also for improves query performance. If you
need high availability, you should have 2 servers naturally. In this case, 5
shards and 1 replica (note 1 replica means 1 additional copy, total of 2
copies) would be reasonable.
If you will have only 1 server, then 5 shards and no replica is better, you
can always add a replica later, if needed. Adding shards later is not that
easy so even with 1 server, I'd keep number of shards higher. If your memory
is really limited and you don't expect expansion then you can limit the
number of shards to 2.
As you can see, there are a lot of if, then, unless in the statements above.
I'd recommend playing around with it, see how the performance is and adjust
accordingly, rather than trying to figure out everything from the beginning.
It would not be difficult to change your config and re-index 50K docs
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
On Fri, Apr 15, 2011 at 3:49 AM, Kristian Jörg krjg@devo.se wrote:
Thanks for the reply.
However I still do not understand how I should configure shards/replicas
in respect to different scenarios. For instance if I use a single node and
end up with 5 shards in it. Would it not be better to use a setup of 1 shard
/ 1 replica? All is on the same node anyways?
Is it better to split the index in many shards and balance those on
several nodes? I guess network protocols will play a role as a limit in that
scenario. And how does replicas play in? In what way does several replicas
affect the situation. Are they for redundance? I.e if a node goes down a
replica of the shard on another node still serves the index? If that is
correct, then what is a good ratio between shards/replicas for a set number
of servers?
Many questions...
Back to the index I am building. Yes, it is a small index in the Lucene/ES
world. But I expect to add several of the same magnitude to the mix as time
goes.
And I have another application (for library catalogues) which will index
up to a million documents per index, but there each document is rather
small.
/Kristian
Shay Banon skrev 2011-04-15 00:38:
Heya,
In elasticsearch, when you create an index, you define the number of
shards and number of replicas. By default, an index is created with 5 shards
and 1 replica per shard. If you have 2 nodes, and create a single index,
with 5 shards and 1 replica, then each node will have 5 shards. Once you
start adding more nodes, those shards will start to get rebalanced between
those nodes.
Thats the gist of it in a very high level manner. One way you can
understand it possibly better is to simply use the cluster state API, it
gives a nice breakdown of indices, shards, and where they are allocated and
on which nodes. You can easily create indices and start several servers on
your laptop and see how it behaves.
Back to your question. This does not sound like a large set of
documents. If you did not play with changing the number of shards and
replica, you have nice growth path of up to 10 nodes (1 shard per node),
size wise you can grow up to 5 shards (not counting replicas) which is up to
5 servers/nodes.
-shay.banon
On Thursday, April 14, 2011 at 2:53 PM, Kristian Jörg wrote:
Well, I am probably looking a bit dumb asking this. But anyways here it
goes.
The first time ever I came by the names shards and replicas was when I
started looking into new search engines for my web applications and
ended up here with ES. I understand these terms describe how the index
is divided between instances of nodes in a cluster, but I would really
need a deep understanding of the concepts. Especially as we are now
beginning to put ES in production and will actually start using two
nodes as a cluster.
So any good pointers to info on these concepts would be much
appreciated. I tried googling but there was too much noise and nothing
to actually explain the concepts. First time I came up empty with
google...
I am about to put an index of about 50.000 documents (each document an
OCR interpreted page of a book) on a cluster of two servers. What would
be a good setting for shards and replicas for this type of index and
cluster?