Many indexes problem

Reza · May 21, 2013, 10:01am

Hi all...

We are currently evaluating ElasticSearch for our products needs and have
run into some difficulty figuring out how to deploy it and scale it within
our product. We currently have around 30,000 customers (companies)... some
of which are small (5000 documents) and some of which are large (2,000,000+
documents). As customers grow over time they may increase in size and move
from a small or medium sized customer into a large one.

We would like to index all of our customers documents in Elasticsearch. But
we have had problems with each scenario we have considered. Here are the
proposals and the problems we have faced... any advice is appreciated.

First Proposal: One index for each customer.
Problems:

When we tested with 500 small indexes (each index has the default 5
shards) on one server (-Xms4g, -Xmx6g), the server started extremely
slowly. It took 30 minutes for server to go from Red to Yellow status and
when we tested with 1000 indexes it took 60 minutes.
The other problem with this setup is RAM usage. It grabbed around 1.6 GB
of java heap space for start-up even when there was no load. When we
provided search load on the indexes, the heap increased to 5.1GB (and the
GC didn't release the RAM after we stopped loading).

With this setup we would be able to manage and remove customers very easily
and we would prefer to setup our cluster with this model if we can find a
solution for the problem but our initial tests really disappointed us.

Second Proposal: One index for large customers and using a few large
indexes for all the smaller customers.

Problems:

Migration from a jointly used index into a single large index would be
difficult. (we would likely need to do this if a customer got to big in
order to improve the response times) and re-indexing documents would be
quite difficult and slow for large data sets.
Deleting customers would be more difficult.

With the first solution we could easily remove an index folder when a
customer is deleted, but with a multi-tenant solution we would need to
delete their documents from a shared index (We have no idea how heavy
delete operation would be and how it would effect the optimization process)

Any advice you can give to help us find a practical solution is greatly
appreciated.

Reza

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · May 21, 2013, 10:46am

On 21 May 2013 12:01, Reza raliakbari@gmail.com wrote:

Second Proposal: One index for large customers and using a few large
indexes for all the smaller customers.

Problems:

Migration from a jointly used index into a single large index would be
difficult. (we would likely need to do this if a customer got to big in
order to improve the response times) and re-indexing documents would be
quite difficult and slow for large data sets.

First, it wouldn't be that slow - depends on how much data, what hardware
you have etc. Second, you can do it in the background, then switch the
customer alias from the shared index to the dedicated index in one atomic
step.

Deleting customers would be more difficult.

Just use a delete-by-query. It's not as efficient as dropping an index,
but it will work fine.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Drew_Morris · May 21, 2013, 11:49am

Hi Clint... so are you saying that the second proposal is the better
solution and we should not really consider the first? Do you think the
second solution is the way to go? I am a bit concerned about re-indexing
1,000,000+ documents during a "move" especially given that there would be
an increased load on the other resources while the data is opened and
extracted in order to re-send it to ES to perform the indexing. I am pretty
sure we want to avoid the re-indexing if possible.

Do you think the 1 index per company is bad idea given what you know about
ES?

Drew

On Tuesday, May 21, 2013 6:46:05 AM UTC-4, Clinton Gormley wrote:

On 21 May 2013 12:01, Reza <ralia...@gmail.com <javascript:>> wrote:

Second Proposal: One index for large customers and using a few large
indexes for all the smaller customers.

Problems:

Migration from a jointly used index into a single large index would
be difficult. (we would likely need to do this if a customer got to big in
order to improve the response times) and re-indexing documents would be
quite difficult and slow for large data sets.

First, it wouldn't be that slow - depends on how much data, what hardware
you have etc. Second, you can do it in the background, then switch the
customer alias from the shared index to the dedicated index in one atomic
step.

Deleting customers would be more difficult.

Just use a delete-by-query. It's not as efficient as dropping an index,
but it will work fine.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

drewr · May 21, 2013, 1:43pm

Drew Morris wrote:

Do you think the 1 index per company is bad idea given what you
know about ES?

Neither of Reza's approaches is wrong. Each has trade-offs.
Single-tenant indices have a lot of advantages, you just have to
do a little more work client-side to make them scale well. Here's
one way to do it.

https://groups.google.com/d/msg/elasticsearch/9L5cWIAib94/K7zdHEW-4P0J

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · May 23, 2013, 9:50am

Hi Drew (Morris)

Personally, I'd go for the second option. Reindexing a million docs should
be done in way less than an hour (depending of course on docs, hardware
etc). And you can control your indexing speed so that you don't overwhelm
your resources.

clint

On 21 May 2013 15:43, Drew Raines aaraines@gmail.com wrote:

Drew Morris wrote:

Do you think the 1 index per company is bad idea given what you know

about ES?

Neither of Reza's approaches is wrong. Each has trade-offs. Single-tenant
indices have a lot of advantages, you just have to do a little more work
client-side to make them scale well. Here's one way to do it.

https://groups.google.com/d/**msg/elasticsearch/9L5cWIAib94/**K7zdHEW-4P0J https://groups.google.com/d/msg/elasticsearch/9L5cWIAib94/K7zdHEW-4P0J

Drew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.com elasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reza · May 26, 2013, 6:30pm

I was wondering whether it is reasonable that Elasticsearch supports a
variation of algorithm to control the number of open indexes. For example a
LRU model can close the least recently used index when the total number of
open indexes has exceeded a constant number.

Reza

On Thursday, May 23, 2013 1:20:43 PM UTC+3:30, Clinton Gormley wrote:

Hi Drew (Morris)

Personally, I'd go for the second option. Reindexing a million docs
should be done in way less than an hour (depending of course on docs,
hardware etc). And you can control your indexing speed so that you don't
overwhelm your resources.

clint

On 21 May 2013 15:43, Drew Raines <aara...@gmail.com <javascript:>> wrote:

Drew Morris wrote:

Do you think the 1 index per company is bad idea given what you know

about ES?

Neither of Reza's approaches is wrong. Each has trade-offs.
Single-tenant indices have a lot of advantages, you just have to do a
little more work client-side to make them scale well. Here's one way to do
it.

https://groups.google.com/d/**msg/elasticsearch/9L5cWIAib94/**
K7zdHEW-4P0Jhttps://groups.google.com/d/msg/elasticsearch/9L5cWIAib94/K7zdHEW-4P0J

Drew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Multiple indices vs. multiple shards approach Elasticsearch	10	2265	November 4, 2022
Recommendation for multi index/node/cluster setup Elasticsearch	2	345	May 1, 2019
Many small indices vs One large index Elasticsearch	6	1300	November 11, 2020
Multi tenancy drawbacks Elasticsearch	3	356	October 8, 2021
Many small indices vs one large index? Elasticsearch	10	5411	July 6, 2017

Many indexes problem

Related topics