Few question about elasticsearch index and cluster

-can we add/remove specific fields dynamically, once the index has
been created?
-can we configure ratio of index in memory or in file?
-can we add new node into elasticsearch cluster when it is required?
any side effect?

--

On Tuesday, October 23, 2012 6:51:19 AM UTC-4, Robbie Cheng wrote:

-can we add/remove specific fields dynamically, once the index has
been created?
-can we configure ratio of index in memory or in file?
-can we add new node into elasticsearch cluster when it is required?
any side effect?

--

Hello Robbie,

On Tue, Oct 23, 2012 at 1:51 PM, Robbie Cheng robbiecheng@gmail.com wrote:

-can we add/remove specific fields dynamically, once the index has
been created?

You can always add new fields to your mapping, but you can't remove
them. If you want to remove a field you will have to reindex your
data.

For more information about mapping, take a look here:
http://www.elasticsearch.org/guide/reference/mapping/

-can we configure ratio of index in memory or in file?

I'm not sure if I got the question right, but I don't think you can.
You can specify whether to store your index in memory or on the file
system, and there are some settings for each:
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Given your question, I suppose your data doesn't fit in memory, so
you'd probably want to store your indices on the file system, and
leave it up to the OS to cache some of that in memory. Elasticsearch
also has its own caches: for example, many filters are cached by
default and you can change settings there. You can find some more
information about ES caching here:
http://www.elasticsearch.org/guide/reference/index-modules/cache.html

-can we add new node into elasticsearch cluster when it is required?
any side effect?

Definitely. This is where sharding comes into play. In short, it goes like this:

  • each index is divided into shards. By default there are 5, but you
    can change that for each index
  • each shard might have a number of replicas. Replicas are good for
    redundancy and also query performance (since queries will also run on
    replicas). By default there's 1 replica for each shard

So if you have one node and one index, by default you can see all your
5 shards allocated on your unique node. Replicas won't get allocated
in this scenario because it doesn't make sense.

If you add one more node, the 5 replicas will get assigned to it.

If you continue to add nodes, your 5 shards and 5 replicas (one per
shard) will be automatically balanced between nodes so that each node
will get roughly the same number of indices. For example, on 5 nodes
you'll get 2 shards for each (whether they are primary shards or
replicas).

So with one index, with the default configuration, you're good for 10
nodes. If you add an 11th node, it will get no shards. While you can
change the number of replicas on a live cluster, you can't change the
number of primary shards. So that's why you'd need to plan your number
of shards in production.

Please note that the logic above applies to the total number of shards
in your cluster. For all your indices. So, for example if you have 2
indices with the default configuration on a cluster, Elasticsearch can
distribute your data on up to 20 nodes.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--

Hey Radu,

Thanks for your detailed reply, one follow-up questions about the # of
primary shards since it can't be change afterwards?
What's the maxium number of primary shards, and is there any way that we
can determine how many of primary shards we need?

Thanks,

Radu Gheorghe於 2012年10月23日星期二UTC+8下午8時12分39秒寫道:

Hello Robbie,

On Tue, Oct 23, 2012 at 1:51 PM, Robbie Cheng <robbi...@gmail.com<javascript:>>
wrote:

-can we add/remove specific fields dynamically, once the index has
been created?

You can always add new fields to your mapping, but you can't remove
them. If you want to remove a field you will have to reindex your
data.

For more information about mapping, take a look here:
http://www.elasticsearch.org/guide/reference/mapping/

-can we configure ratio of index in memory or in file?

I'm not sure if I got the question right, but I don't think you can.
You can specify whether to store your index in memory or on the file
system, and there are some settings for each:
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Given your question, I suppose your data doesn't fit in memory, so
you'd probably want to store your indices on the file system, and
leave it up to the OS to cache some of that in memory. Elasticsearch
also has its own caches: for example, many filters are cached by
default and you can change settings there. You can find some more
information about ES caching here:
http://www.elasticsearch.org/guide/reference/index-modules/cache.html

-can we add new node into elasticsearch cluster when it is required?
any side effect?

Definitely. This is where sharding comes into play. In short, it goes like
this:

  • each index is divided into shards. By default there are 5, but you
    can change that for each index
  • each shard might have a number of replicas. Replicas are good for
    redundancy and also query performance (since queries will also run on
    replicas). By default there's 1 replica for each shard

So if you have one node and one index, by default you can see all your
5 shards allocated on your unique node. Replicas won't get allocated
in this scenario because it doesn't make sense.

If you add one more node, the 5 replicas will get assigned to it.

If you continue to add nodes, your 5 shards and 5 replicas (one per
shard) will be automatically balanced between nodes so that each node
will get roughly the same number of indices. For example, on 5 nodes
you'll get 2 shards for each (whether they are primary shards or
replicas).

So with one index, with the default configuration, you're good for 10
nodes. If you add an 11th node, it will get no shards. While you can
change the number of replicas on a live cluster, you can't change the
number of primary shards. So that's why you'd need to plan your number
of shards in production.

Please note that the logic above applies to the total number of shards
in your cluster. For all your indices. So, for example if you have 2
indices with the default configuration on a cluster, Elasticsearch can
distribute your data on up to 20 nodes.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--

Hello Robbie,

On Thu, Nov 22, 2012 at 3:17 AM, Robbie Cheng robbiecheng@gmail.com wrote:

Hey Radu,

Thanks for your detailed reply, one follow-up questions about the # of
primary shards since it can't be change afterwards?
What's the maxium number of primary shards, and is there any way that we
can determine how many of primary shards we need?

If you're going to have a static number of indices, you need to think on
how many nodes you're going to split a single set of those indices (without
accounting replicas) in the log run, without having to reindex your data.
But you also have to account that each shard comes with an overhead, so you
can't just go with 1000 shards, "just to be sure".

Here's a very good video in which you can see some solutions on how you can
organize your indices and shards:
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--