Alternatives to oversharding to handle index / cluster growth?

Hello,

Are there any known and good alternatives to handling index (and cluster)
growth other than "Oversharding"?

I see a few problems with oversharing the index:

  1. You can't really guess well how your index/cluster will grow, so you'll
    always be somewhat wrong even if your cluster doesn't really grow much or
    at all.

  2. If your cluster keeps growing, then you'll have the ideal number of
    shards only at one point in time when the cluster is of just the right
    size/fit for the number of shards and you'll have the "wrong" number of
    shards both before and after that point.

  3. While your cluster is small, this oversharded index means each node will
    have a possibly high number of shards. If queries are such that all shards
    are queried in parallel, if there are more shards on a node than CPU cores,
    there'll be some CPU wait time involved.

I assume the ultimate solution would involve resharding the index while
adding more nodes to the cluster. Is this correct?

As far as I know, there are no plans to implement this any time soon. Is
this correct? I couldn't find any issues...

Finally, are there any viable alternatives to oversharding today?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Use alias pointing to single index with X shards ... As you outgrow X
shards create a new index with X more shards, add to alias, all new docs go
into the new index only, you still search across both indices because of
the alias. As you know everything is Lucene indices under the covers so it
doesn't matter if you have 1 index with 10 shards or 2 indices with 5
shards.

On Tuesday, February 12, 2013, Otis Gospodnetic wrote:

Hello,

Are there any known and good alternatives to handling index (and cluster)
growth other than "Oversharding"?

I see a few problems with oversharing the index:

  1. You can't really guess well how your index/cluster will grow, so you'll
    always be somewhat wrong even if your cluster doesn't really grow much or
    at all.

  2. If your cluster keeps growing, then you'll have the ideal number of
    shards only at one point in time when the cluster is of just the right
    size/fit for the number of shards and you'll have the "wrong" number of
    shards both before and after that point.

  3. While your cluster is small, this oversharded index means each node
    will have a possibly high number of shards. If queries are such that all
    shards are queried in parallel, if there are more shards on a node than CPU
    cores, there'll be some CPU wait time involved.

I assume the ultimate solution would involve resharding the index while
adding more nodes to the cluster. Is this correct?

As far as I know, there are no plans to implement this any time soon. Is
this correct? I couldn't find any issues...

Finally, are there any viable alternatives to oversharding today?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com <javascript:_e({},
'cvml', 'elasticsearch%2Bunsubscribe@googlegroups.com');>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks,
Matt Weber

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

But assume the cluster is already maxed out (e.g. imagine I have 1-core
servers and have planned on growing the cluster to 100 shards. So I
oversharded the index to 100 or say even 200 shards. At this point,
assuming a decent concurrent query rate, the cluster is likely maxed out.
But my index continues to grow...). Adding a new index to the same
cluster wouldn't help the situation then. What one would need to do then
is create the second cluster and then search across both of them. Is that
doable with ES clients/TransportClient/aliases?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

On Tue, Feb 12, 2013 at 11:29 PM, Matt Weber matt@mattweber.org wrote:

Use alias pointing to single index with X shards ... As you outgrow X
shards create a new index with X more shards, add to alias, all new docs go
into the new index only, you still search across both indices because of
the alias. As you know everything is Lucene indices under the covers so it
doesn't matter if you have 1 index with 10 shards or 2 indices with 5
shards.

On Tuesday, February 12, 2013, Otis Gospodnetic wrote:

Hello,

Are there any known and good alternatives to handling index (and cluster)
growth other than "Oversharding"?

I see a few problems with oversharing the index:

  1. You can't really guess well how your index/cluster will grow, so
    you'll always be somewhat wrong even if your cluster doesn't really grow
    much or at all.

  2. If your cluster keeps growing, then you'll have the ideal number of
    shards only at one point in time when the cluster is of just the right
    size/fit for the number of shards and you'll have the "wrong" number of
    shards both before and after that point.

  3. While your cluster is small, this oversharded index means each node
    will have a possibly high number of shards. If queries are such that all
    shards are queried in parallel, if there are more shards on a node than CPU
    cores, there'll be some CPU wait time involved.

I assume the ultimate solution would involve resharding the index while
adding more nodes to the cluster. Is this correct?

As far as I know, there are no plans to implement this any time soon. Is
this correct? I couldn't find any issues...

Finally, are there any viable alternatives to oversharding today?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks,
Matt Weber

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You won't need a 2nd cluster but defiantly more machines with replicas to
help with the query load. The trick is noticing you are getting close to
maxing out a box before it happens so you don't blow it up while shuffling
the shards after adding new nodes.

If you needed to you could always use the alias approach with the shard
allocation settings to make sure those 100 shards stay on the original
node, and new index shards go on the new node only. This way you don't
don't push the original node over its limit and can continue to index...

On Tuesday, February 12, 2013, Otis Gospodnetic wrote:

Hi,

But assume the cluster is already maxed out (e.g. imagine I have 1-core
servers and have planned on growing the cluster to 100 shards. So I
oversharded the index to 100 or say even 200 shards. At this point,
assuming a decent concurrent query rate, the cluster is likely maxed out.
But my index continues to grow...). Adding a new index to the same
cluster wouldn't help the situation then. What one would need to do then
is create the second cluster and then search across both of them. Is that
doable with ES clients/TransportClient/aliases?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

On Tue, Feb 12, 2013 at 11:29 PM, Matt Weber <matt@mattweber.org<javascript:_e({}, 'cvml', 'matt@mattweber.org');>

wrote:

Use alias pointing to single index with X shards ... As you outgrow X
shards create a new index with X more shards, add to alias, all new docs go
into the new index only, you still search across both indices because of
the alias. As you know everything is Lucene indices under the covers so it
doesn't matter if you have 1 index with 10 shards or 2 indices with 5
shards.

On Tuesday, February 12, 2013, Otis Gospodnetic wrote:

Hello,

Are there any known and good alternatives to handling index (and
cluster) growth other than "Oversharding"?

I see a few problems with oversharing the index:

  1. You can't really guess well how your index/cluster will grow, so
    you'll always be somewhat wrong even if your cluster doesn't really grow
    much or at all.

  2. If your cluster keeps growing, then you'll have the ideal number of
    shards only at one point in time when the cluster is of just the right
    size/fit for the number of shards and you'll have the "wrong" number of
    shards both before and after that point.

  3. While your cluster is small, this oversharded index means each node
    will have a possibly high number of shards. If queries are such that all
    shards are queried in parallel, if there are more shards on a node than CPU
    cores, there'll be some CPU wait time involved.

I assume the ultimate solution would involve resharding the index while
adding more nodes to the cluster. Is this correct?

As far as I know, there are no plans to implement this any time soon.
Is this correct? I couldn't find any issues...

Finally, are there any viable alternatives to oversharding today?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
ELASTICSEARCH Performance Monitoring -
http://sematext.com/spm/index.html

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks,
Matt Weber

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com <javascript:_e({},
'cvml', 'elasticsearch%2Bunsubscribe@googlegroups.com');>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com <javascript:_e({},
'cvml', 'elasticsearch%2Bunsubscribe@googlegroups.com');>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks,
Matt Weber

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Otis,

here is my cluster/index growth strategy. I also do not like
oversharding. From my understanding, "oversharding" is when there are
much more shards in an index than the total number of CPU cores can
handle. Assuming the final size of an index or nodes is known, it could
be that a maximum number of shards is used at index creation from very
the beginning, which leads easily to oversharding, if the number of
nodes is not sufficient. The advantage is, growing the cluster is easy -
it comes down to just adding a node. The disadvantage is, the full power
of the cluster is evolved only after the last addition of a node.

In contrast, I tried to find out a reasonable number of shards at
production start with a reasonable number of future migrations. When I
grow the cluster, I try to double the capacity of the whole cluster with
each migration step, so future migrations will be less frequent
(assuming linear growth of data). The advantage is, the cluster runs
almost always with full efficiency. The disadvantage is, migrations must
include index copies or re-indexing. (It reminds me of resizing a hash
table by copying all the entries).

Development phase: single server (node), 1 to 5 shards/index for making
rough sizing/workload decisions

Diskspace: around 1 TB per server, neglected here

Decision: 3 nodes (24 CPU cores per server), 1 index, 12 shards, 1 replica
Workload balance formula for production start "more cores than shards"
(my rule of thumb): total of 72 CPU cores > total of 2*12=24 shards

Growth of factor x: 3*x nodes, (/)*x shards

Workload balance formula for factor 2: 6 nodes, total of 624=144 CPU
cores > total of 2
24=48 shards
Based on performance metrics, it can be viable to assign more or less
shards per CPU core.
Due to higher query load, a higher replica level can make sense.

Production start: 3 nodes, 1 index, 12 shards, 1 replica

Migration step for cluster growth:
- add new nodes to cluster, shards will relocate
- create new index with n shards (e.g. 2*24=48 or 64 or 72 shards,
depending on measured workload, but less than CPU cores)
- re-index (or copy old index _source over to new index)
- reset index alias
- optional step: detaching old/obsolete nodes with
cluster.routing.allocation.exclude_ip

If fast index recovery is critical, there may be additional constraints.
For example, Lucene index size on disk per shard should not exceed x GB.

Best regards,

Jörg

Am 13.02.13 04:45, schrieb Otis Gospodnetic:

Hello,

Are there any known and good alternatives to handling index (and
cluster) growth other than "Oversharding"?

I see a few problems with oversharing the index:

  1. You can't really guess well how your index/cluster will grow, so
    you'll always be somewhat wrong even if your cluster doesn't really
    grow much or at all.

  2. If your cluster keeps growing, then you'll have the ideal number of
    shards only at one point in time when the cluster is of just the right
    size/fit for the number of shards and you'll have the "wrong" number
    of shards both before and after that point.

  3. While your cluster is small, this oversharded index means each node
    will have a possibly high number of shards. If queries are such that
    all shards are queried in parallel, if there are more shards on a node
    than CPU cores, there'll be some CPU wait time involved.

I assume the ultimate solution would involve resharding the index
while adding more nodes to the cluster. Is this correct?

As far as I know, there are no plans to implement this any time soon.
Is this correct? I couldn't find any issues...

Finally, are there any viable alternatives to oversharding today?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We also use this approach with new indices for writing and an alias for
searching. We named it "timebased sizeboxing" :wink:
It works well so far for us, better than other strategies we tried out
before. Nevertheless it brings some problems: having several indices
aggregated by an alias means you can have several documents with the same
id. You have to be aware of that when reading or capable of dealing with
it. And second, as updating by overwriting doesnt work anymore, you have to
find a way to realize when documents in an index are outdated (because you
already have newer versions in other indices) and you can actually get rid
of some indices or large parts of it. This "cleaning up" is something we
are still working on to find a good strategy. Any smart ideas about that?

Thanks!
Andrej

Am Mittwoch, 13. Februar 2013 05:29:09 UTC+1 schrieb Matt Weber:

Use alias pointing to single index with X shards ... As you outgrow X
shards create a new index with X more shards, add to alias, all new docs go
into the new index only, you still search across both indices because of
the alias. As you know everything is Lucene indices under the covers so it
doesn't matter if you have 1 index with 10 shards or 2 indices with 5
shards.

--
Thanks,
Matt Weber

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

In a search or get response, you should also evaluate the _index field,
it contains the un-aliased index name.

Cross-index updates are more convenient on time-based series when the
timestamp is part of the index name, e.g. "test20130213".

A cleaner thread could use boosting on timestamp buckets to ensure the
most recent doc is always top ranked.

For updating purpose within an index, I recommend using the _uid field
http://www.elasticsearch.org/guide/reference/mapping/uid-field.html

Best regards,

Jörg

Am 13.02.13 10:34, schrieb Andrej Rosenheinrich:

We also use this approach with new indices for writing and an alias
for searching. We named it "timebased sizeboxing" :wink:
It works well so far for us, better than other strategies we tried out
before. Nevertheless it brings some problems: having several indices
aggregated by an alias means you can have several documents with the
same id. You have to be aware of that when reading or capable of
dealing with it. And second, as updating by overwriting doesnt work
anymore, you have to find a way to realize when documents in an index
are outdated (because you already have newer versions in other
indices) and you can actually get rid of some indices or large parts
of it. This "cleaning up" is something we are still working on to find
a good strategy. Any smart ideas about that?

Thanks!
Andrej

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Otis Gospodnetic wrote:

Are there any known and good alternatives to handling index (and
cluster) growth other than "Oversharding"?

Here is a scheme I've seen work really well in very large clusters
with single-tenant indices.

Create index foo-0 with a single shard and an alias foo that points
to it. Searches go to /foo. Once foo-0 has 100-200GiB, index into
foo-1 (with a single shard) and add it to the alias. Repeat, rolling
over into a new index every few hundred GiB. However, you don't
actually want to index into the last index. You want to hash into
(or randomly choose from) the pool of ones that have room to spare.
In the simple case that happens to be the newest index.

However, if your index requests don't naturally spread out over the
tenants of the cluster, you may get hot indices/shards when only
rolling over. If you know this will be the case, say, you're
ingesting 1TiB behind foo, you can preallocate foo-[1..5] so you're
indexing into more than a single shard. If those don't distribute
favorably over the cluster, cluster-reroute them around a bit.

It sounds like a lot of manual work that ES does for you, and it does
in some sense, but it makes it possible to scale to hundreds of TiB
and thousands of shards while retaining multitenancy.

-Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Many use cases for ES is 'add only' which makes this problem a little
easier via the aliases and adding indices over time. One only ever adds to
the 'current' index, and can ignore the other indices from an update point
of view

For update heavy data use cases, it's a harder story. I understand updates
can't be done against an alias (which backing index gets it?) but I wish
that Deletes were allowed. This would allow one to have an 'update' or
'work' index leaving the others as 'archive'. Any update to a record could
then be a DELETE to the alias BUT exclude the 'work' index, because I
think that would screw with the tombstone. and then have the code only send
the update to the 'work' index. Yes, I guess the APIs allow us to see
what indices are behind an alias, and one could enumerate such that a
series of deletes are issued to the individual indices but that's not as
clean as I would like.

Paul

On 15 February 2013 10:14, Drew Raines aaraines@gmail.com wrote:

Otis Gospodnetic wrote:

Are there any known and good alternatives to handling index (and
cluster) growth other than "Oversharding"?

Here is a scheme I've seen work really well in very large clusters
with single-tenant indices.

Create index foo-0 with a single shard and an alias foo that points
to it. Searches go to /foo. Once foo-0 has 100-200GiB, index into
foo-1 (with a single shard) and add it to the alias. Repeat, rolling
over into a new index every few hundred GiB. However, you don't
actually want to index into the last index. You want to hash into
(or randomly choose from) the pool of ones that have room to spare.
In the simple case that happens to be the newest index.

However, if your index requests don't naturally spread out over the
tenants of the cluster, you may get hot indices/shards when only
rolling over. If you know this will be the case, say, you're
ingesting 1TiB behind foo, you can preallocate foo-[1..5] so you're
indexing into more than a single shard. If those don't distribute
favorably over the cluster, cluster-reroute them around a bit.

It sounds like a lot of manual work that ES does for you, and it does
in some sense, but it makes it possible to scale to hundreds of TiB
and thousands of shards while retaining multitenancy.

-Drew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Smith wrote:

For update heavy data use cases, it's a harder story. I understand
updates can't be done against an alias (which backing index gets
it?) but I wish that Deletes were allowed.

This is easily overcome with hashing the doc ID, similar to how ES
routes documents to shards behind an index. If you don't know the
doc, then you run the query on each index.

Again, it's sort of reimplementing some low-level ES features, but it
doesn't have to handle any of the tough details, and ES still makes
it perform nicely.

-Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.