Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client


(Drew) #1

Hey Guys,

I'm working on an analytics dashboard project where we collect events into Elasticsearch for clients. Each client could have millions of events per month. We are thinking of using one index with one shard and one replica per client. Looking at Logstash, it seems like Logstash creates 1 index, with 1 shard and 0 replicas per day, so that's where we got the inspiration. We don't anticipate having more than 1000 "clients". Are there any issues with this design pattern?

Thanks,

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com.
For more options, visit https://groups.google.com/d/optout.


#2

Drew,

The Elasticsearch default is to create 5 shards for each index. I would start with this. Typically it is best to actually over-shard, which is to say have more than 1 shard per node per index. There is not really any measurable cost to this and it gives you flexibility in your design as you scale out.

For example, if you start with 5 shards on a single server and then later decide you want to add another machine, Elasticsearch will automatically transfer some of those shards over to the new server, giving you better scalability. If you start with only 1 shard you will not get this benefit.

Andrew

On Jun 26, 2014, at 8:29 PM, Drew Kutcharian drew@venarc.com wrote:

Hey Guys,

I'm working on an analytics dashboard project where we collect events into Elasticsearch for clients. Each client could have millions of events per month. We are thinking of using one index with one shard and one replica per client. Looking at Logstash, it seems like Logstash creates 1 index, with 1 shard and 0 replicas per day, so that's where we got the inspiration. We don't anticipate having more than 1000 "clients". Are there any issues with this design pattern?

Thanks,

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.


(Drew) #3

Hi Andrew,

Not sure if you read my original question. The question is about having a separate index per customer since we are going to have < 1000 customers but each would have a lot of data. Each shard comes with it's own overhead since it's an instance of Lucene. I was going with the 1 shard with 1 replica route because initially we can put a 100 of these customers on the same machine and as they grow larger we can allocate more machines and move the indexes around. With this approach, our capacity for a single customer would be the max a single machine can handle which I think should be enough given our requirements. If a customer is really pushing a single machine to it's max, then we can move them to their own Elasticsearch cluster.

  • Drew

On Jun 26, 2014, at 1:57 PM, Andrew Selden andrew.selden@elasticsearch.com wrote:

Drew,

The Elasticsearch default is to create 5 shards for each index. I would start with this. Typically it is best to actually over-shard, which is to say have more than 1 shard per node per index. There is not really any measurable cost to this and it gives you flexibility in your design as you scale out.

For example, if you start with 5 shards on a single server and then later decide you want to add another machine, Elasticsearch will automatically transfer some of those shards over to the new server, giving you better scalability. If you start with only 1 shard you will not get this benefit.

Andrew

On Jun 26, 2014, at 8:29 PM, Drew Kutcharian drew@venarc.com wrote:

Hey Guys,

I'm working on an analytics dashboard project where we collect events into Elasticsearch for clients. Each client could have millions of events per month. We are thinking of using one index with one shard and one replica per client. Looking at Logstash, it seems like Logstash creates 1 index, with 1 shard and 0 replicas per day, so that's where we got the inspiration. We don't anticipate having more than 1000 "clients". Are there any issues with this design pattern?

Thanks,

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA1CDC1E-3919-4D81-B4D3-9B4972FF5C87%40venarc.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4

Pretty sure he read it as I'd have offered the same advice :slight_smile:
You cannot change the sharding of an index after creation, you need to
completely reindex the data to do so. This may not be a major issue for you
but it's something to take into account when you have hundreds or thousands
of customers, and hence indexes.

You could also look at having a few indexes and use aliases and routing as
this would be a much more efficient way of doing things.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 27 June 2014 11:21, Drew Kutcharian drew@venarc.com wrote:

Hi Andrew,

Not sure if you read my original question. The question is about having a
separate index per customer since we are going to have < 1000 customers but
each would have a lot of data. Each shard comes with it's own overhead
since it's an instance of Lucene. I was going with the 1 shard with 1
replica route because initially we can put a 100 of these customers on the
same machine and as they grow larger we can allocate more machines and move
the indexes around. With this approach, our capacity for a single customer
would be the max a single machine can handle which I think should be enough
given our requirements. If a customer is really pushing a single machine to
it's max, then we can move them to their own Elasticsearch cluster.

  • Drew

On Jun 26, 2014, at 1:57 PM, Andrew Selden <
andrew.selden@elasticsearch.com> wrote:

Drew,

The Elasticsearch default is to create 5 shards for each index. I would
start with this. Typically it is best to actually over-shard, which is to
say have more than 1 shard per node per index. There is not really any
measurable cost to this and it gives you flexibility in your design as you
scale out.

For example, if you start with 5 shards on a single server and then
later decide you want to add another machine, Elasticsearch will
automatically transfer some of those shards over to the new server, giving
you better scalability. If you start with only 1 shard you will not get
this benefit.

Andrew

On Jun 26, 2014, at 8:29 PM, Drew Kutcharian drew@venarc.com wrote:

Hey Guys,

I'm working on an analytics dashboard project where we collect events
into Elasticsearch for clients. Each client could have millions of events
per month. We are thinking of using one index with one shard and one
replica per client. Looking at Logstash, it seems like Logstash creates 1
index, with 1 shard and 0 replicas per day, so that's where we got the
inspiration. We don't anticipate having more than 1000 "clients". Are there
any issues with this design pattern?

Thanks,

Drew

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA1CDC1E-3919-4D81-B4D3-9B4972FF5C87%40venarc.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Drew) #5

Hi Mark,

The problem that we have is that each "customer" could generate 60-80 million docs/month on average. In addition, when a customer leaves, we would need to delete all their data. So hence it makes sense to have an index per customer (or even multiple indexes per customer). Another issue is that we are going to be needing to do a lot of "has_child" type of queries. And ES as it currently stands, loads up all the IDs of all the parent docs in index before running the query. So if we keep each customer on their own index, those has_child queries would only need to load up the ids for that specific client. In addition, one index with one shard per day is how Logstash works which is designed for ingesting a lot of data.

  • Drew

On Jun 26, 2014, at 6:24 PM, Mark Walkom markw@campaignmonitor.com wrote:

Pretty sure he read it as I'd have offered the same advice :slight_smile:
You cannot change the sharding of an index after creation, you need to completely reindex the data to do so. This may not be a major issue for you but it's something to take into account when you have hundreds or thousands of customers, and hence indexes.

You could also look at having a few indexes and use aliases and routing as this would be a much more efficient way of doing things.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 27 June 2014 11:21, Drew Kutcharian drew@venarc.com wrote:
Hi Andrew,

Not sure if you read my original question. The question is about having a separate index per customer since we are going to have < 1000 customers but each would have a lot of data. Each shard comes with it's own overhead since it's an instance of Lucene. I was going with the 1 shard with 1 replica route because initially we can put a 100 of these customers on the same machine and as they grow larger we can allocate more machines and move the indexes around. With this approach, our capacity for a single customer would be the max a single machine can handle which I think should be enough given our requirements. If a customer is really pushing a single machine to it's max, then we can move them to their own Elasticsearch cluster.

  • Drew

On Jun 26, 2014, at 1:57 PM, Andrew Selden andrew.selden@elasticsearch.com wrote:

Drew,

The Elasticsearch default is to create 5 shards for each index. I would start with this. Typically it is best to actually over-shard, which is to say have more than 1 shard per node per index. There is not really any measurable cost to this and it gives you flexibility in your design as you scale out.

For example, if you start with 5 shards on a single server and then later decide you want to add another machine, Elasticsearch will automatically transfer some of those shards over to the new server, giving you better scalability. If you start with only 1 shard you will not get this benefit.

Andrew

On Jun 26, 2014, at 8:29 PM, Drew Kutcharian drew@venarc.com wrote:

Hey Guys,

I'm working on an analytics dashboard project where we collect events into Elasticsearch for clients. Each client could have millions of events per month. We are thinking of using one index with one shard and one replica per client. Looking at Logstash, it seems like Logstash creates 1 index, with 1 shard and 0 replicas per day, so that's where we got the inspiration. We don't anticipate having more than 1000 "clients". Are there any issues with this design pattern?

Thanks,

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA1CDC1E-3919-4D81-B4D3-9B4972FF5C87%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

Ahh ok, knowing this extra info is good as it helps us help you :slight_smile:

Logstash doesn't define how many shards to use, at least not that I can see
here -
https://github.com/elasticsearch/logstash/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json

or through some quick tests. This means that any values it takes for shard
count will come from the ES config, which as was mentioned earlier, has a
default of 5 shards per index (plus one replica).

Keep in mind that with only one shard your search throughput is limited to
a single thread, thus if you have 80 million records with parent+child
relationships chances are it will take a fair while to get a response to
any query.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 27 June 2014 11:49, Drew Kutcharian drew@venarc.com wrote:

Hi Mark,

The problem that we have is that each “customer" could generate 60-80
million docs/month on average. In addition, when a customer leaves, we
would need to delete all their data. So hence it makes sense to have an
index per customer (or even multiple indexes per customer). Another issue
is that we are going to be needing to do a lot of “has_child” type of
queries. And ES as it currently stands, loads up all the IDs of all the
parent docs in index before running the query. So if we keep each customer
on their own index, those has_child queries would only need to load up the
ids for that specific client. In addition, one index with one shard per day
is how Logstash works which is designed for ingesting a lot of data.

  • Drew

On Jun 26, 2014, at 6:24 PM, Mark Walkom markw@campaignmonitor.com
wrote:

Pretty sure he read it as I'd have offered the same advice :slight_smile:
You cannot change the sharding of an index after creation, you need to
completely reindex the data to do so. This may not be a major issue for you
but it's something to take into account when you have hundreds or thousands
of customers, and hence indexes.

You could also look at having a few indexes and use aliases and routing as
this would be a much more efficient way of doing things.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 27 June 2014 11:21, Drew Kutcharian drew@venarc.com wrote:

Hi Andrew,

Not sure if you read my original question. The question is about having a
separate index per customer since we are going to have < 1000 customers but
each would have a lot of data. Each shard comes with it's own overhead
since it's an instance of Lucene. I was going with the 1 shard with 1
replica route because initially we can put a 100 of these customers on the
same machine and as they grow larger we can allocate more machines and move
the indexes around. With this approach, our capacity for a single customer
would be the max a single machine can handle which I think should be enough
given our requirements. If a customer is really pushing a single machine to
it's max, then we can move them to their own Elasticsearch cluster.

  • Drew

On Jun 26, 2014, at 1:57 PM, Andrew Selden <
andrew.selden@elasticsearch.com> wrote:

Drew,

The Elasticsearch default is to create 5 shards for each index. I would
start with this. Typically it is best to actually over-shard, which is to
say have more than 1 shard per node per index. There is not really any
measurable cost to this and it gives you flexibility in your design as you
scale out.

For example, if you start with 5 shards on a single server and then
later decide you want to add another machine, Elasticsearch will
automatically transfer some of those shards over to the new server, giving
you better scalability. If you start with only 1 shard you will not get
this benefit.

Andrew

On Jun 26, 2014, at 8:29 PM, Drew Kutcharian drew@venarc.com wrote:

Hey Guys,

I'm working on an analytics dashboard project where we collect events
into Elasticsearch for clients. Each client could have millions of events
per month. We are thinking of using one index with one shard and one
replica per client. Looking at Logstash, it seems like Logstash creates 1
index, with 1 shard and 0 replicas per day, so that's where we got the
inspiration. We don't anticipate having more than 1000 "clients". Are there
any issues with this design pattern?

Thanks,

Drew

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA1CDC1E-3919-4D81-B4D3-9B4972FF5C87%40venarc.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com
https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bCysLLjgwJY822YU65rDj26BSkrfTjXhU68ZxM6zhLaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7