Looking for a suggestion to better organize our indices for performance

Ron_Sher · December 9, 2014, 1:36pm

Hi,

We have a multi tenant SAAS application in which we keep data for all
accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6
billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and
about proper use of routing.
Few things we contemplate:

Use routing according to service so that we will probably benefit from
caching better.
Change the indices according to service + month so that we will query
much less data, but will add many indices (now instead of 12 indices a year
we will have 300x12 and growing when the number of clients grow).

Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · December 9, 2014, 2:37pm

How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher ron.sher@gmail.com wrote:

Hi,

We have a multi tenant SAAS application in which we keep data for all
accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6
billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and
about proper use of routing.
Few things we contemplate:

Use routing according to service so that we will probably benefit
from caching better.

Change the indices according to service + month so that we will
query much less data, but will add many indices (now instead of 12 indices
a year we will have 300x12 and growing when the number of clients grow).

Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ron_Sher · December 9, 2014, 2:50pm

we have 24 data nodes, 3 master nodes and 3 client nodes.
We use m3.4xlarge for the data nodes

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom markwalkom@gmail.com wrote:

How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher ron.sher@gmail.com wrote:

Hi,

We have a multi tenant SAAS application in which we keep data for all
accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6
billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices
and about proper use of routing.
Few things we contemplate:

Use routing according to service so that we will probably benefit
from caching better.

Change the indices according to service + month so that we will
query much less data, but will add many indices (now instead of 12 indices
a year we will have 300x12 and growing when the number of clients grow).

Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · December 9, 2014, 2:55pm

Currently you have shards upwards of over 100GB, which is massive and
probably causing you some issues. Ideally you should be aiming for a max
shard size of 40-50GB, so increasing your shard count to 24 brings you
under this level and also gives you room for growth on an index level.

Having a higher shard count also spreads the query load, and reduces the
amount of thrashing (ie data transfer) if/when a node goes down.

On 9 December 2014 at 15:50, Ron Sher ron.sher@gmail.com wrote:

we have 24 data nodes, 3 master nodes and 3 client nodes.
We use m3.4xlarge for the data nodes

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom markwalkom@gmail.com wrote:

How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher ron.sher@gmail.com wrote:

Hi,

We have a multi tenant SAAS application in which we keep data for all
accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6
billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices
and about proper use of routing.
Few things we contemplate:

Use routing according to service so that we will probably benefit
from caching better.

Change the indices according to service + month so that we will
query much less data, but will add many indices (now instead of 12 indices
a year we will have 300x12 and growing when the number of clients grow).

Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9K%2B6_MpOQN4nAg22pmgRHccdNzb2_RHZjYSWprF8Q7EA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jilles_van_Gurp · December 9, 2014, 3:52pm

Indeed increase your shard count. Also, you may want to consider using a
routing parameter based on e.g. a tenant_id to ensure all queries related
to a tenant only hit shards that actually have data for that tenant. Those
two measures would reduce the size of each shard and the number of shards
involved for each tenant. To increase query capacity, you could consider
increasing the number of replicas as well this ways, you have more nodes
that can handle query traffic for the same data.

Jilles

On Tuesday, December 9, 2014 3:56:06 PM UTC+1, Mark Walkom wrote:

Currently you have shards upwards of over 100GB, which is massive and
probably causing you some issues. Ideally you should be aiming for a max
shard size of 40-50GB, so increasing your shard count to 24 brings you
under this level and also gives you room for growth on an index level.

Having a higher shard count also spreads the query load, and reduces the
amount of thrashing (ie data transfer) if/when a node goes down.

On 9 December 2014 at 15:50, Ron Sher <ron....@gmail.com <javascript:>>
wrote:

we have 24 data nodes, 3 master nodes and 3 client nodes.
We use m3.4xlarge for the data nodes

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom <markw...@gmail.com
<javascript:>> wrote:

How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher <ron....@gmail.com <javascript:>>
wrote:

Hi,

We have a multi tenant SAAS application in which we keep data for all
accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6
billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices
and about proper use of routing.
Few things we contemplate:

Use routing according to service so that we will probably benefit
from caching better.

Change the indices according to service + month so that we will
query much less data, but will add many indices (now instead of 12 indices
a year we will have 300x12 and growing when the number of clients grow).

Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/09794d0b-f26c-45f3-9b19-0b2efb1c0e31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ron_Sher · December 9, 2014, 5:18pm

BTW, we use c3.4xlarge and not as I said before

On Tuesday, December 9, 2014 5:52:55 PM UTC+2, Jilles van Gurp wrote:

Indeed increase your shard count. Also, you may want to consider using a
routing parameter based on e.g. a tenant_id to ensure all queries related
to a tenant only hit shards that actually have data for that tenant. Those
two measures would reduce the size of each shard and the number of shards
involved for each tenant. To increase query capacity, you could consider
increasing the number of replicas as well this ways, you have more nodes
that can handle query traffic for the same data.

Jilles

On Tuesday, December 9, 2014 3:56:06 PM UTC+1, Mark Walkom wrote:

Currently you have shards upwards of over 100GB, which is massive and
probably causing you some issues. Ideally you should be aiming for a max
shard size of 40-50GB, so increasing your shard count to 24 brings you
under this level and also gives you room for growth on an index level.

Having a higher shard count also spreads the query load, and reduces the
amount of thrashing (ie data transfer) if/when a node goes down.

On 9 December 2014 at 15:50, Ron Sher ron....@gmail.com wrote:

we have 24 data nodes, 3 master nodes and 3 client nodes.
We use m3.4xlarge for the data nodes

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom markw...@gmail.com wrote:

How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher ron....@gmail.com wrote:

Hi,

We have a multi tenant SAAS application in which we keep data for all
accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6
billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices
and about proper use of routing.
Few things we contemplate:

Use routing according to service so that we will probably
benefit from caching better.

Change the indices according to service + month so that we will
query much less data, but will add many indices (now instead of 12 indices
a year we will have 300x12 and growing when the number of clients grow).

Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/814ae60c-9aac-40c8-bffc-6c869c7e375c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
2 clusters versus 1 big cluster? Elasticsearch	6	2703	July 6, 2017
Figuring out the optimal number of shards Elasticsearch	6	1651	July 6, 2017
Relational->ES index schema strategy (1 index per table or all tables in 1 index) Elasticsearch	5	1021	July 6, 2017
Public/Private Index architecture Elasticsearch	9	1235	July 6, 2017
Increasing shards and then nodes Elasticsearch	12	916	July 6, 2017

Looking for a suggestion to better organize our indices for performance

Related topics