Considering scalability , is it right to keep a large number of primary shards at beginning?


(昕玫) #1

Hi,

I feel very confuse when deciding the number of primary shards at beginning.

As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must increase
and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400 or
more shards in 10 machines, will it reduce the performance of cluster?

Thank you for reading and look forward to your suggestions.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

You don't want 400 shards on 10 servers. You do want the ability to reindex
to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example
https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.

However you probably don't want an index with 200 shards irrespective, you
may want to take a look at your data structure and split things out.

PS - We're moving to https://discuss.elastic.co/, please join us there for
any future discussions!

On 28 May 2015 at 12:46, xinmeike@163.com wrote:

Hi,

I feel very confuse when deciding the number of primary shards at
beginning.

As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400
or more shards in 10 machines, will it reduce the performance of cluster?

Thank you for reading and look forward to your suggestions.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-PWRDC89CzDqSSmC%2BnjP4FD0o_F%2BDZjNOLzmDpXwDtaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(昕玫) #3

Thanks for your answer~

May be when 10 servers extends to 100 servers it can work ,but I‘m afraid
that if we do reindex to 100 servers it may cost a long time and huge I/O
resources.We need to stop the service for a long time and all the data need
to transport from old index to new one.

Is there any easier way to horizontal expansion?

P.S. I cann’t visit https://discuss.elastic.co/
https://discuss.elastic.co/ today .It is blank all the time. (・ˇ_ˇ・)

在 2015年5月28日星期四 UTC+8下午3:05:13,Mark Walkom写道:

You don't want 400 shards on 10 servers. You do want the ability to
reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example
https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.

However you probably don't want an index with 200 shards irrespective, you
may want to take a look at your data structure and split things out.

PS - We're moving to https://discuss.elastic.co/, please join us there
for any future discussions!

On 28 May 2015 at 12:46, <xinm...@163.com <javascript:>> wrote:

Hi,

I feel very confuse when deciding the number of primary shards at
beginning.

As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400
or more shards in 10 machines, will it reduce the performance of
cluster?

Thank you for reading and look forward to your suggestions.

--
Please update your bookmarks! We have moved to
https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4

You don't need to stop everything to reindex, leverage aliases and you can
do it live.

On 28 May 2015 at 18:57, xinmeike@163.com wrote:

Thanks for your answer~

May be when 10 servers extends to 100 servers it can work ,but I‘m afraid
that if we do reindex to 100 servers it may cost a long time and huge I/O
resources.We need to stop the service for a long time and all the data need
to transport from old index to new one.

Is there any easier way to horizontal expansion?

P.S. I cann’t visit https://discuss.elastic.co/
https://discuss.elastic.co/ today .It is blank all the time. (・ˇ_ˇ・)

在 2015年5月28日星期四 UTC+8下午3:05:13,Mark Walkom写道:

You don't want 400 shards on 10 servers. You do want the ability to
reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example
https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.

However you probably don't want an index with 200 shards irrespective,
you may want to take a look at your data structure and split things out.

PS - We're moving to https://discuss.elastic.co/, please join us there
for any future discussions!

On 28 May 2015 at 12:46, xinm...@163.com wrote:

Hi,

I feel very confuse when deciding the number of primary shards at
beginning.

As we know the number of shards and replicas can be defined per index at
the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400
or more shards in 10 machines, will it reduce the performance of
cluster?

Thank you for reading and look forward to your suggestions.

--
Please update your bookmarks! We have moved to
https://discuss.elastic.co/

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/


You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X89RqPb2atD%2B7VLWoEmbP5kyAcs%3D_Vs_TmOozGSEZCBWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(昕玫) #5

Thank you for your answer!

I would like to learn Logstash and take reindex into consideration.

best regards,
shinyke

在 2015年5月28日星期四 UTC+8下午5:21:00,Mark Walkom写道:

You don't need to stop everything to reindex, leverage aliases and you can
do it live.

On 28 May 2015 at 18:57, <xinm...@163.com <javascript:>> wrote:

Thanks for your answer~

May be when 10 servers extends to 100 servers it can work ,but I‘m afraid
that if we do reindex to 100 servers it may cost a long time and huge I/O
resources.We need to stop the service for a long time and all the data need
to transport from old index to new one.

Is there any easier way to horizontal expansion?

P.S. I cann’t visit https://discuss.elastic.co/
https://discuss.elastic.co/ today .It is blank all the time. (・ˇ_ˇ・)

在 2015年5月28日星期四 UTC+8下午3:05:13,Mark Walkom写道:

You don't want 400 shards on 10 servers. You do want the ability to
reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example
https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06.

However you probably don't want an index with 200 shards irrespective,
you may want to take a look at your data structure and split things out.

PS - We're moving to https://discuss.elastic.co/, please join us there
for any future discussions!

On 28 May 2015 at 12:46, xinm...@163.com wrote:

Hi,

I feel very confuse when deciding the number of primary shards at
beginning.

As we know the number of shards and replicas can be defined per index
at the time the index is created. After the index is created, we may change
the number of replicas dynamically anytime but we cannot change the number
of primary shards after-the-fact. Our ES project may be run as trial
version at beginning an it is only 10 machines in cluster. However, if the
project run in production environment, the machine magnitude must
increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run
400 or more shards in 10 machines, will it reduce the performance of
cluster?

Thank you for reading and look forward to your suggestions.

--
Please update your bookmarks! We have moved to
https://discuss.elastic.co/

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to
https://discuss.elastic.co/


You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5591e786-fa10-4727-9bf0-ba8b10ad8786%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6