I mean to index ~1.9 TB text data using elasticsearch, the default number
of shards is 5, would it meet the need?
I could afford about 10 machine to form a cluster.
Thanks for your help in advance.
I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?
It depends
I could afford about 10 machine to form a cluster.
Thanks for your help in advance.
Really, it depends. On:
your data
how you index it
how you query it
your hardware
your expectations
Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.
The best approach would be:
on the type of hardware that you intend to use in production,
create an index with a single primary shard, no replicas
index your data into that shard
run typical queries under typical load
measure
At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index
Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad
that the # of shard could not be changed once be set.
On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:
Hiya
I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?
It depends
I could afford about 10 machine to form a cluster.
Thanks for your help in advance.
Really, it depends. On:
your data
how you index it
how you query it
your hardware
your expectations
Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.
The best approach would be:
on the type of hardware that you intend to use in production,
create an index with a single primary shard, no replicas
index your data into that shard
run typical queries under typical load
measure
At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index
But you can create new index with new # of shards and have an alias on top of all your indices.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 26 févr. 2013 à 11:10, Jingang Wang bitwjg@gmail.com a écrit :
Hi Clinton,
Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad that the # of shard could not be changed once be set.
On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:
Hiya
I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?
It depends
I could afford about 10 machine to form a cluster.
Thanks for your help in advance.
Really, it depends. On:
your data
how you index it
how you query it
your hardware
your expectations
Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.
The best approach would be:
on the type of hardware that you intend to use in production,
create an index with a single primary shard, no replicas
index your data into that shard
run typical queries under typical load
measure
At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index
On Tue, 2013-02-26 at 02:10 -0800, Jingang Wang wrote:
Hi Clinton,
Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so
sad that the # of shard could not be changed once be set.
It's actually not as problematic in practice as it seems. ES gives you
enormous flexibility because of the concept that querying one index with
5 shards is exactly equivalent to querying 5 indices with 1 shard each.
That means you can create new extra indices later, and you can use
aliases to make all of this transparent to your application.
I don’t know it yet, so I could create multiple indices and query in all of
them just like in one index.
it sounds great, thanks, David.
On Tuesday, February 26, 2013 6:52:10 PM UTC+8, David Pilato wrote:
But you can create new index with new # of shards and have an alias on top
of all your indices.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 26 févr. 2013 à 11:10, Jingang Wang <bit...@gmail.com <javascript:>> a
écrit :
Hi Clinton,
Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad
that the # of shard could not be changed once be set.
On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:
Hiya
I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?
It depends
I could afford about 10 machine to form a cluster.
Thanks for your help in advance.
Really, it depends. On:
your data
how you index it
how you query it
your hardware
your expectations
Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.
The best approach would be:
on the type of hardware that you intend to use in production,
create an index with a single primary shard, no replicas
index your data into that shard
run typical queries under typical load
measure
At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index
clint
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.