How many shards to set when ~2TB data need to be indexed?

Hi there,

I mean to index ~1.9 TB text data using elasticsearch, the default number
of shards is 5, would it meet the need?
I could afford about 10 machine to form a cluster.
Thanks for your help in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?

It depends :slight_smile:

I could afford about 10 machine to form a cluster.
Thanks for your help in advance.

Really, it depends. On:

  1. your data
  2. how you index it
  3. how you query it
  4. your hardware
  5. your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:

  • on the type of hardware that you intend to use in production,
  • create an index with a single primary shard, no replicas
  • index your data into that shard
  • run typical queries under typical load
  • measure

At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Clinton,

Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad
that the # of shard could not be changed once be set.

On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:

Hiya

I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?

It depends :slight_smile:

I could afford about 10 machine to form a cluster.
Thanks for your help in advance.

Really, it depends. On:

  1. your data
  2. how you index it
  3. how you query it
  4. your hardware
  5. your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:

  • on the type of hardware that you intend to use in production,
  • create an index with a single primary shard, no replicas
  • index your data into that shard
  • run typical queries under typical load
  • measure

At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

But you can create new index with new # of shards and have an alias on top of all your indices.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 févr. 2013 à 11:10, Jingang Wang bitwjg@gmail.com a écrit :

Hi Clinton,

Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad that the # of shard could not be changed once be set.

On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:

Hiya

I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?

It depends :slight_smile:

I could afford about 10 machine to form a cluster.
Thanks for your help in advance.

Really, it depends. On:

  1. your data
  2. how you index it
  3. how you query it
  4. your hardware
  5. your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:

  • on the type of hardware that you intend to use in production,
  • create an index with a single primary shard, no replicas
  • index your data into that shard
  • run typical queries under typical load
  • measure

At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tue, 2013-02-26 at 02:10 -0800, Jingang Wang wrote:

Hi Clinton,

Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so
sad that the # of shard could not be changed once be set.

It's actually not as problematic in practice as it seems. ES gives you
enormous flexibility because of the concept that querying one index with
5 shards is exactly equivalent to querying 5 indices with 1 shard each.

That means you can create new extra indices later, and you can use
aliases to make all of this transparent to your application.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I don’t know it yet, so I could create multiple indices and query in all of
them just like in one index.
it sounds great, thanks, David.

On Tuesday, February 26, 2013 6:52:10 PM UTC+8, David Pilato wrote:

But you can create new index with new # of shards and have an alias on top
of all your indices.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 févr. 2013 à 11:10, Jingang Wang <bit...@gmail.com <javascript:>> a
écrit :

Hi Clinton,

Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad
that the # of shard could not be changed once be set.

On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:

Hiya

I mean to index ~1.9 TB text data using elasticsearch, the default
number of shards is 5, would it meet the need?

It depends :slight_smile:

I could afford about 10 machine to form a cluster.
Thanks for your help in advance.

Really, it depends. On:

  1. your data
  2. how you index it
  3. how you query it
  4. your hardware
  5. your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:

  • on the type of hardware that you intend to use in production,
  • create an index with a single primary shard, no replicas
  • index your data into that shard
  • run typical queries under typical load
  • measure

At some point, the shard will stop performing well enough to meet your
expectations. That's the shard limit. Now you know how big to make
your index

clint

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.