I feel very confuse when deciding the number of primary shards at beginning.
As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must increase
and there will have 200 or more machines.
How can we decide shards number at beginning? Is it encourage to run 400 or
more shards in 10 machines, will it reduce the performance of cluster?
Thank you for reading and look forward to your suggestions.
You don't want 400 shards on 10 servers. You do want the ability to reindex
to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.
However you probably don't want an index with 200 shards irrespective, you
may want to take a look at your data structure and split things out.
I feel very confuse when deciding the number of primary shards at
beginning.
As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.
How can we decide shards number at beginning? Is it encourage to run 400
or more shards in 10 machines, will it reduce the performance of cluster?
Thank you for reading and look forward to your suggestions.
May be when 10 servers extends to 100 servers it can work ,but I‘m afraid
that if we do reindex to 100 servers it may cost a long time and huge I/O
resources.We need to stop the service for a long time and all the data need
to transport from old index to new one.
You don't want 400 shards on 10 servers. You do want the ability to
reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.
However you probably don't want an index with 200 shards irrespective, you
may want to take a look at your data structure and split things out.
On 28 May 2015 at 12:46, <xinm...@163.com <javascript:>> wrote:
Hi,
I feel very confuse when deciding the number of primary shards at
beginning.
As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.
How can we decide shards number at beginning? Is it encourage to run 400
or more shards in 10 machines, will it reduce the performance of
cluster?
Thank you for reading and look forward to your suggestions.
May be when 10 servers extends to 100 servers it can work ,but I‘m afraid
that if we do reindex to 100 servers it may cost a long time and huge I/O
resources.We need to stop the service for a long time and all the data need
to transport from old index to new one.
You don't want 400 shards on 10 servers. You do want the ability to
reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.
However you probably don't want an index with 200 shards irrespective,
you may want to take a look at your data structure and split things out.
I feel very confuse when deciding the number of primary shards at
beginning.
As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.
How can we decide shards number at beginning? Is it encourage to run 400
or more shards in 10 machines, will it reduce the performance of
cluster?
Thank you for reading and look forward to your suggestions.
I would like to learn Logstash and take reindex into consideration.
best regards,
shinyke
在 2015年5月28日星期四 UTC+8下午5:21:00,Mark Walkom写道:
You don't need to stop everything to reindex, leverage aliases and you can
do it live.
On 28 May 2015 at 18:57, <xinm...@163.com <javascript:>> wrote:
Thanks for your answer~
May be when 10 servers extends to 100 servers it can work ,but I‘m afraid
that if we do reindex to 100 servers it may cost a long time and huge I/O
resources.We need to stop the service for a long time and all the data need
to transport from old index to new one.
You don't want 400 shards on 10 servers. You do want the ability to
reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.
However you probably don't want an index with 200 shards irrespective,
you may want to take a look at your data structure and split things out.
I feel very confuse when deciding the number of primary shards at
beginning.
As we know the number of shards and replicas can be defined per index
at the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.
How can we decide shards number at beginning? Is it encourage to run
400 or more shards in 10 machines, will it reduce the performance of
cluster?
Thank you for reading and look forward to your suggestions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.