Is shard splitting supported in Elastic search, any alternate

Per, http://thinkbiganalytics.com/solr-vs-elastic-search/ , ElasticSearch
does not suport shard spiltting which Solr supports. Is it generally an
issue in production, if yes what alternate user has :-

Shard Splitting

Shards are the partitioning unit for the Lucene index, both Solr and
ElasticSearch have them. You can distribute your index by placing shards on
different machines in a cluster. Until April 2013, both Solr and
ElasticSearch would not allow you to change the number of shards in your
index. So if you decided you wanted to split your index into 10 shards on
day one, and two years later you want to add another 5 shards, you were not
able to do that without completely starting over (reindexing everything).
As of April 2013 Solr supports shard splitting
https://issues.apache.org/jira/browse/SOLR-3755, which allows you to
create more shards by splitting existing shards. ElasticSearch still does
not support this.

Thanks, Gaurav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3LGdMLYFPn8jX0CdhgRdP8suWTg0ekR8LcoPRc%3D1gasHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Its never been a problem for me.

Normally for time series data you handle this by creating a new index every
day. For non-time series data I basically do this:

It has the advantage of letting me change the mapping and analysis
configuration in the process.

I can imagine situations where not having shard splitting would be painful
but I haven't been in any.

Nik

On Thu, Dec 11, 2014 at 4:20 PM, Gaurav gupta gupta.gaurav0125@gmail.com
wrote:

Per, http://thinkbiganalytics.com/solr-vs-elastic-search/ , ElasticSearch
does not suport shard spiltting which Solr supports. Is it generally an
issue in production, if yes what alternate user has :-

Shard Splitting

Shards are the partitioning unit for the Lucene index, both Solr and
ElasticSearch have them. You can distribute your index by placing shards on
different machines in a cluster. Until April 2013, both Solr and
ElasticSearch would not allow you to change the number of shards in your
index. So if you decided you wanted to split your index into 10 shards on
day one, and two years later you want to add another 5 shards, you were not
able to do that without completely starting over (reindexing everything).
As of April 2013 Solr supports shard splitting
https://issues.apache.org/jira/browse/SOLR-3755, which allows you to
create more shards by splitting existing shards. ElasticSearch still does
not support this.

Thanks, Gaurav

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALZAj3LGdMLYFPn8jX0CdhgRdP8suWTg0ekR8LcoPRc%3D1gasHg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALZAj3LGdMLYFPn8jX0CdhgRdP8suWTg0ekR8LcoPRc%3D1gasHg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2jft849dBCTR2E%2Bzy1rbmwkUh9NSYp_C_LWH1hcR%3DPfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Shard splitting is an anti-pattern if done on server side. If you really
need more shards, you have not made good planning and you can always add
another index and use index aliasing to search on both. Plus, there are
export/import tools if you want reindexing from client side.

Jörg
Am 11.12.2014 22:20 schrieb "Gaurav gupta" gupta.gaurav0125@gmail.com:

Per, http://thinkbiganalytics.com/solr-vs-elastic-search/ , ElasticSearch
does not suport shard spiltting which Solr supports. Is it generally an
issue in production, if yes what alternate user has :-

Shard Splitting

Shards are the partitioning unit for the Lucene index, both Solr and
ElasticSearch have them. You can distribute your index by placing shards on
different machines in a cluster. Until April 2013, both Solr and
ElasticSearch would not allow you to change the number of shards in your
index. So if you decided you wanted to split your index into 10 shards on
day one, and two years later you want to add another 5 shards, you were not
able to do that without completely starting over (reindexing everything).
As of April 2013 Solr supports shard splitting
https://issues.apache.org/jira/browse/SOLR-3755, which allows you to
create more shards by splitting existing shards. ElasticSearch still does
not support this.

Thanks, Gaurav

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALZAj3LGdMLYFPn8jX0CdhgRdP8suWTg0ekR8LcoPRc%3D1gasHg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALZAj3LGdMLYFPn8jX0CdhgRdP8suWTg0ekR8LcoPRc%3D1gasHg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGFACM%2BjFMKDQp_5RS%3D2vx4hOUjADiFag8DP0faoqoqYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I would agree that shard splitting is not the best approach. Much better to design for expansion by building in layers of indirection into your application through the techniques of over-sharding, index aliasing, and multiple indices.

The non-mutually exclusive techniques are as follows.

First, you can allocate more shards than you need when you create the index. If you need 5 shards today, but think you might need 10 shards in 6 months, then just create the index with 10 shards. We call this over-sharding. There really is no penalty to doing this within reason. Searching against 1 index with 50 shards is exactly the same as searching against 50 indices with one shard.

Second, as others have mentioned, use multiple indices and hide them away behind an alias. Your application code should always reference aliases rather than concrete indices. It’s an extra layer of indirection that can save you a lot of operational headaches. Whenever possible, try to design for a multiple index architecture. For some people this means creating new indices per day. Other times it might mean creating an index per user. Whatever the case, it’s almost always possible to find some attribute which allows a clean logical partitioning of your data.

0xA

On Dec 11, 2014, at 3:46 PM, joergprante@gmail.com wrote:

Shard splitting is an anti-pattern if done on server side. If you really need more shards, you have not made good planning and you can always add another index and use index aliasing to search on both. Plus, there are export/import tools if you want reindexing from client side.

Jörg

Am 11.12.2014 22:20 schrieb "Gaurav gupta" <gupta.gaurav0125@gmail.com mailto:gupta.gaurav0125@gmail.com>:
Per, http://thinkbiganalytics.com/solr-vs-elastic-search/ http://thinkbiganalytics.com/solr-vs-elastic-search/ , ElasticSearch does not suport shard spiltting which Solr supports. Is it generally an issue in production, if yes what alternate user has :-

Shard Splitting

Shards are the partitioning unit for the Lucene index, both Solr and ElasticSearch have them. You can distribute your index by placing shards on different machines in a cluster. Until April 2013, both Solr and ElasticSearch would not allow you to change the number of shards in your index. So if you decided you wanted to split your index into 10 shards on day one, and two years later you want to add another 5 shards, you were not able to do that without completely starting over (reindexing everything). As of April 2013 Solr supports shard splitting https://issues.apache.org/jira/browse/SOLR-3755, which allows you to create more shards by splitting existing shards. ElasticSearch still does not support this.

Thanks, Gaurav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3LGdMLYFPn8jX0CdhgRdP8suWTg0ekR8LcoPRc%3D1gasHg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALZAj3LGdMLYFPn8jX0CdhgRdP8suWTg0ekR8LcoPRc%3D1gasHg%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGFACM%2BjFMKDQp_5RS%3D2vx4hOUjADiFag8DP0faoqoqYw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGFACM%2BjFMKDQp_5RS%3D2vx4hOUjADiFag8DP0faoqoqYw%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F2DEE364-92F7-45D6-B7A1-FFF946D82150%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.

Adding another index is adding shards.. you're just going about it the
wrong way.

The point of shard splitting is that you always have the right number of
shards.

FULLY re-indexing all your data is silly. You have to re-read ALL of it
again, and re-emit it, and re-index it.

With shard splitting, if you're at capacity, you just add new hardware, the
largest shard splits in half, and then they move to another box.

I suspect anyone against this doesn't really have that much data.

On Thursday, December 11, 2014 3:46:21 PM UTC-8, Jörg Prante wrote:

Shard splitting is an anti-pattern if done on server side. If you really
need more shards, you have not made good planning and you can always add
another index and use index aliasing to search on both. Plus, there are
export/import tools if you want reindexing from client side.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5b1584f-e7e8-4378-8839-1c85e65093ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It seems to me that most people arguing this have trivial scalability
requirements. Not trying to be rude by saying that btw. But shard
splitting is really the only way to scale from 250GB indexed to 500TB
indexed.

On Thursday, December 11, 2014 4:58:42 PM UTC-8, Andrew Selden wrote:

I would agree that shard splitting is not the best approach. Much better
to design for expansion by building in layers of indirection into your
application through the techniques of over-sharding, index aliasing, and
multiple indices.

Yes.. all those are lame attempts at shard splitting.

Over sharding is wasteful, it might not have a significant performance
impact in practice if you only have a few shards, but if you only add a few
you're not goign to be able to increase your capacity.

Using multiple indexes is just a way to cheat by adding more shards in a
round about fashion, your runtime query performance will suffer because of
this.

First, you can allocate more shards than you need when you create the
index. If you need 5 shards today, but think you might need 10 shards in 6
months, then just create the index with 10 shards. We call this
over-sharding. There really is no penalty to doing this within reason.

So you've only given yourself a 2x overhead in capacity. That's not very
elastic.

With shard splitting you can go from 2x to 10x to 100x without any wasted
IO in over-indexing.

Searching against 1 index with 50 shards is exactly the same as searching
against 50 indices with one shard.

No it's not.. if the shards are on the same box you're paying a performance
cost there.. If the indexes are small and fit in memory you won't feel it
that much.

Second, as others have mentioned, use multiple indices and hide them away
behind an alias.

If each index has say 20 shards, and you have 10 indexes, then you have 200
shards to run your query against. This means queries that use all these
indexes will get slower and slower.

The ideal situation is to shard split so that when you need more shards,
you just split.

If ES had this feature today, no one would be arguing against shard
splitting. It would just be common practice. The only issue is that ES
hasn't implemented it yet so it's not a viable solution.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

As I found out yesterday, the problem with shard splitting in ES is that
there algorithms that are used to round robin the data allocation during
indexing that are based on a pre-determined hash. So if you suddenly alter
the hash you may end up with shards that are overloaded compared to others.

Maybe a dev can confirm/clarify this, but that was the understanding I took
away for not doing shard splitting within ES.

On 12 December 2014 at 02:25, Kevin Burton burtonator@gmail.com wrote:

It seems to me that most people arguing this have trivial scalability
requirements. Not trying to be rude by saying that btw. But shard
splitting is really the only way to scale from 250GB indexed to 500TB
indexed.

On Thursday, December 11, 2014 4:58:42 PM UTC-8, Andrew Selden wrote:

I would agree that shard splitting is not the best approach. Much better
to design for expansion by building in layers of indirection into your
application through the techniques of over-sharding, index aliasing, and
multiple indices.

Yes.. all those are lame attempts at shard splitting.

Over sharding is wasteful, it might not have a significant performance
impact in practice if you only have a few shards, but if you only add a few
you're not goign to be able to increase your capacity.

Using multiple indexes is just a way to cheat by adding more shards in a
round about fashion, your runtime query performance will suffer because of
this.

First, you can allocate more shards than you need when you create the
index. If you need 5 shards today, but think you might need 10 shards in 6
months, then just create the index with 10 shards. We call this
over-sharding. There really is no penalty to doing this within reason.

So you've only given yourself a 2x overhead in capacity. That's not very
elastic.

With shard splitting you can go from 2x to 10x to 100x without any wasted
IO in over-indexing.

Searching against 1 index with 50 shards is exactly the same as searching
against 50 indices with one shard.

No it's not.. if the shards are on the same box you're paying a
performance cost there.. If the indexes are small and fit in memory you
won't feel it that much.

Second, as others have mentioned, use multiple indices and hide them away
behind an alias.

If each index has say 20 shards, and you have 10 indexes, then you have
200 shards to run your query against. This means queries that use all
these indexes will get slower and slower.

The ideal situation is to shard split so that when you need more shards,
you just split.

If ES had this feature today, no one would be arguing against shard
splitting. It would just be common practice. The only issue is that ES
hasn't implemented it yet so it's not a viable solution.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

The reason is the shard addressing. While it might be possible to imagine
that a distributed system can query all nodes and ask "please give me the
next free allocation for this doc" this is very slow. It would allow shard
splitting by simply adding shards, iterating over nodes, but science has
proven this is not scalable - the time of indexing and search grows linear
(if not quadratic) with the number of nodes and Elasticsearch scalability
would be severely broken.

When a new doc arrives, a hash is computed to address the shard where it
should reside in O(1) time. This is fast and scalable over the number of
nodes. The price to pay is to set a constant number of shards as the
divisor for the formula to distribute the doc by the doc ID.

If the hash computation for adding shards was changed during cluster life,
all things will be messed up, ES would not find docs any longer. To remedy
this, you have to reindex and this recomputes the hashes for a higher
number of shards.

There are systems who allow shard growth in a limited way. By using a hash
ring algorithm, you could address e.g. only each third shard, and let the
next two places in the hash ring free, maybe for adding shards later, see
e.g. Project Voldemort

But this would mean two things: there would be a limited number of shard
splits (only a fixed number of shards can be reserved), and it would be up
to the user to mess around with the very sensible condition when to
activate the reserved shards and instruct ES to use a more difficult
algorithm to lookup docs. I can say there is no general condition of shard
overflow. See the shard allocation deciders, there are many good criteria,
but in the end, the user will get the headache. And this is because it is
an anti-pattern: the promise is that shard split will solve a problem, but
you get a new, bigger one.

Elasticsearch philosophy is that it should scale and should be easy to use.
Having no headaches around the shard count, once it is set, is easy.

Jörg

On Fri, Dec 12, 2014 at 9:31 AM, Mark Walkom markwalkom@gmail.com wrote:

As I found out yesterday, the problem with shard splitting in ES is that
there algorithms that are used to round robin the data allocation during
indexing that are based on a pre-determined hash. So if you suddenly alter
the hash you may end up with shards that are overloaded compared to others.

Maybe a dev can confirm/clarify this, but that was the understanding I
took away for not doing shard splitting within ES.

On 12 December 2014 at 02:25, Kevin Burton burtonator@gmail.com wrote:

It seems to me that most people arguing this have trivial scalability
requirements. Not trying to be rude by saying that btw. But shard
splitting is really the only way to scale from 250GB indexed to 500TB
indexed.

On Thursday, December 11, 2014 4:58:42 PM UTC-8, Andrew Selden wrote:

I would agree that shard splitting is not the best approach. Much better
to design for expansion by building in layers of indirection into your
application through the techniques of over-sharding, index aliasing, and
multiple indices.

Yes.. all those are lame attempts at shard splitting.

Over sharding is wasteful, it might not have a significant performance
impact in practice if you only have a few shards, but if you only add a few
you're not goign to be able to increase your capacity.

Using multiple indexes is just a way to cheat by adding more shards in a
round about fashion, your runtime query performance will suffer because of
this.

First, you can allocate more shards than you need when you create the
index. If you need 5 shards today, but think you might need 10 shards in 6
months, then just create the index with 10 shards. We call this
over-sharding. There really is no penalty to doing this within reason.

So you've only given yourself a 2x overhead in capacity. That's not very
elastic.

With shard splitting you can go from 2x to 10x to 100x without any wasted
IO in over-indexing.

Searching against 1 index with 50 shards is exactly the same as
searching against 50 indices with one shard.

No it's not.. if the shards are on the same box you're paying a
performance cost there.. If the indexes are small and fit in memory you
won't feel it that much.

Second, as others have mentioned, use multiple indices and hide them
away behind an alias.

If each index has say 20 shards, and you have 10 indexes, then you have
200 shards to run your query against. This means queries that use all
these indexes will get slower and slower.

The ideal situation is to shard split so that when you need more shards,
you just split.

If ES had this feature today, no one would be arguing against shard
splitting. It would just be common practice. The only issue is that ES
hasn't implemented it yet so it's not a viable solution.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkHHmLfYyKfumFarXWzj37makn7u6cu3MdcX3zCGGGnw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Jorg, you've put it a lot better than I did! :slight_smile:

On 12 December 2014 at 10:38, joergprante@gmail.com joergprante@gmail.com
wrote:

The reason is the shard addressing. While it might be possible to imagine
that a distributed system can query all nodes and ask "please give me the
next free allocation for this doc" this is very slow. It would allow shard
splitting by simply adding shards, iterating over nodes, but science has
proven this is not scalable - the time of indexing and search grows linear
(if not quadratic) with the number of nodes and Elasticsearch scalability
would be severely broken.

When a new doc arrives, a hash is computed to address the shard where it
should reside in O(1) time. This is fast and scalable over the number of
nodes. The price to pay is to set a constant number of shards as the
divisor for the formula to distribute the doc by the doc ID.

If the hash computation for adding shards was changed during cluster life,
all things will be messed up, ES would not find docs any longer. To remedy
this, you have to reindex and this recomputes the hashes for a higher
number of shards.

There are systems who allow shard growth in a limited way. By using a hash
ring algorithm, you could address e.g. only each third shard, and let the
next two places in the hash ring free, maybe for adding shards later, see
e.g. Project Voldemort
http://www.project-voldemort.com/voldemort/design.html

But this would mean two things: there would be a limited number of shard
splits (only a fixed number of shards can be reserved), and it would be up
to the user to mess around with the very sensible condition when to
activate the reserved shards and instruct ES to use a more difficult
algorithm to lookup docs. I can say there is no general condition of shard
overflow. See the shard allocation deciders, there are many good criteria,
but in the end, the user will get the headache. And this is because it is
an anti-pattern: the promise is that shard split will solve a problem, but
you get a new, bigger one.

Elasticsearch philosophy is that it should scale and should be easy to
use. Having no headaches around the shard count, once it is set, is easy.

Jörg

On Fri, Dec 12, 2014 at 9:31 AM, Mark Walkom markwalkom@gmail.com wrote:

As I found out yesterday, the problem with shard splitting in ES is that
there algorithms that are used to round robin the data allocation during
indexing that are based on a pre-determined hash. So if you suddenly alter
the hash you may end up with shards that are overloaded compared to others.

Maybe a dev can confirm/clarify this, but that was the understanding I
took away for not doing shard splitting within ES.

On 12 December 2014 at 02:25, Kevin Burton burtonator@gmail.com wrote:

It seems to me that most people arguing this have trivial scalability
requirements. Not trying to be rude by saying that btw. But shard
splitting is really the only way to scale from 250GB indexed to 500TB
indexed.

On Thursday, December 11, 2014 4:58:42 PM UTC-8, Andrew Selden wrote:

I would agree that shard splitting is not the best approach. Much
better to design for expansion by building in layers of indirection into
your application through the techniques of over-sharding, index aliasing,
and multiple indices.

Yes.. all those are lame attempts at shard splitting.

Over sharding is wasteful, it might not have a significant performance
impact in practice if you only have a few shards, but if you only add a few
you're not goign to be able to increase your capacity.

Using multiple indexes is just a way to cheat by adding more shards in a
round about fashion, your runtime query performance will suffer because of
this.

First, you can allocate more shards than you need when you create the
index. If you need 5 shards today, but think you might need 10 shards in 6
months, then just create the index with 10 shards. We call this
over-sharding. There really is no penalty to doing this within reason.

So you've only given yourself a 2x overhead in capacity. That's not
very elastic.

With shard splitting you can go from 2x to 10x to 100x without any
wasted IO in over-indexing.

Searching against 1 index with 50 shards is exactly the same as
searching against 50 indices with one shard.

No it's not.. if the shards are on the same box you're paying a
performance cost there.. If the indexes are small and fit in memory you
won't feel it that much.

Second, as others have mentioned, use multiple indices and hide them
away behind an alias.

If each index has say 20 shards, and you have 10 indexes, then you have
200 shards to run your query against. This means queries that use all
these indexes will get slower and slower.

The ideal situation is to shard split so that when you need more shards,
you just split.

If ES had this feature today, no one would be arguing against shard
splitting. It would just be common practice. The only issue is that ES
hasn't implemented it yet so it's not a viable solution.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkHHmLfYyKfumFarXWzj37makn7u6cu3MdcX3zCGGGnw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkHHmLfYyKfumFarXWzj37makn7u6cu3MdcX3zCGGGnw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-K%3D17nGnzFHMEdd78Kvm3EZRab13wOHBYmWZ_Gwa%2BhoA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.