Index routing and shard size


(Ashish Nigam) #1

Hi,
I am working on a multi-tenant application and plan to use index alias routing to create and search data for a particular tenant.
I am planning to create dedicate index/alias for high traffic tenants and club low traffic tenants under few aliases.

As per my understanding, only one shard will be used for indexing and searches when I apply routing for creating and searching indexes.

But if only one shard (and I assume on a single node) is always used for high traffic tenant, it will easily grow to a large size. Will a large shard itself not become a bottleneck after sometime?
If yes, what are other ways to mitigate this problem?

One possible way can be to create more indexes for the same tenant and add all new indexes to the same alias. But then, what should be criteria to create a new index?

Please let me know your thoughts.

Thanks
Ashish


(Benjamin Devèze) #2

Hi maybe not answering all your questions but have you looked at
kimchy latest presentation that address some points you are raising I
think:

On Fri, Jun 8, 2012 at 6:28 PM, Ashish Nigam ashish@skyhighnetworks.com wrote:

Hi,
I am working on a multi-tenant application and plan to use index alias routing to create and search data for a particular tenant.
I am planning to create dedicate index/alias for high traffic tenants and club low traffic tenants under few aliases.

As per my understanding, only one shard will be used for indexing and searches when I apply routing for creating and searching indexes.

But if only one shard (and I assume on a single node) is always used for high traffic tenant, it will easily grow to a large size. Will a large shard itself not become a bottleneck after sometime?
If yes, what are other ways to mitigate this problem?

One possible way can be to create more indexes for the same tenant and add all new indexes to the same alias. But then, what should be criteria to create a new index?

Please let me know your thoughts.

Thanks
Ashish

--
Benjamin DEVEZE


(Ashish Nigam) #3

Thanks for the link. This presentation is very informative.

But I am still looking for my answer.
I understand that when we use alias routing, all data indexes/searches will happen through one shard.
I noticed that indexing becomes bit slower when we use only one shard. (I tested this using bulk indexing).

Now what steps can be taken to mitigate the issue of one "hot" shard?
Will creating new indexes and adding then to same alias work efficiently? I am also worried about the fact that there is always a maximum shard size and high traffic tenant may cross those limits.

Also, in Kimchy's presentation, I didn't understand one point in this slide - "users data flow - single index + routing".
It refers to large "overallocation". What is exactly "large overallocation" in the context of this slide?

Thanks
Ashish

On Jun 8, 2012, at 1:53 PM, Benjamin Devèze wrote:

Hi maybe not answering all your questions but have you looked at
kimchy latest presentation that address some points you are raising I
think:
https://speakerdeck.com/u/kimchy/p/elasticsearch-big-data-search-analytics

On Fri, Jun 8, 2012 at 6:28 PM, Ashish Nigam ashish@skyhighnetworks.com wrote:

Hi,
I am working on a multi-tenant application and plan to use index alias routing to create and search data for a particular tenant.
I am planning to create dedicate index/alias for high traffic tenants and club low traffic tenants under few aliases.

As per my understanding, only one shard will be used for indexing and searches when I apply routing for creating and searching indexes.

But if only one shard (and I assume on a single node) is always used for high traffic tenant, it will easily grow to a large size. Will a large shard itself not become a bottleneck after sometime?
If yes, what are other ways to mitigate this problem?

One possible way can be to create more indexes for the same tenant and add all new indexes to the same alias. But then, what should be criteria to create a new index?

Please let me know your thoughts.

Thanks
Ashish

--
Benjamin DEVEZE


(Otis Gospodnetić) #4

Hi Ashish:

Overallocation: https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/Ihrf1cYcfCYJ
Another set of slides with routing
info: http://blog.sematext.com/2012/06/05/slides-scaling-massive-elasticsearch-clusters/

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Saturday, June 9, 2012 6:19:11 PM UTC-4, Ashish Nigam wrote:

Thanks for the link. This presentation is very informative.

But I am still looking for my answer.
I understand that when we use alias routing, all data indexes/searches
will happen through one shard.
I noticed that indexing becomes bit slower when we use only one shard. (I
tested this using bulk indexing).

Now what steps can be taken to mitigate the issue of one "hot" shard?
Will creating new indexes and adding then to same alias work efficiently?
I am also worried about the fact that there is always a maximum shard size
and high traffic tenant may cross those limits.

Also, in Kimchy's presentation, I didn't understand one point in this
slide - "users data flow - single index + routing".
It refers to large "overallocation". What is exactly "large
overallocation" in the context of this slide?

Thanks
Ashish

On Jun 8, 2012, at 1:53 PM, Benjamin Devèze wrote:

Hi maybe not answering all your questions but have you looked at
kimchy latest presentation that address some points you are raising I
think:

https://speakerdeck.com/u/kimchy/p/elasticsearch-big-data-search-analytics

On Fri, Jun 8, 2012 at 6:28 PM, Ashish Nigam ashish@skyhighnetworks.com
wrote:

Hi,
I am working on a multi-tenant application and plan to use index alias
routing to create and search data for a particular tenant.

I am planning to create dedicate index/alias for high traffic tenants
and club low traffic tenants under few aliases.

As per my understanding, only one shard will be used for indexing and
searches when I apply routing for creating and searching indexes.

But if only one shard (and I assume on a single node) is always used
for high traffic tenant, it will easily grow to a large size. Will a large
shard itself not become a bottleneck after sometime?

If yes, what are other ways to mitigate this problem?

One possible way can be to create more indexes for the same tenant and
add all new indexes to the same alias. But then, what should be criteria to
create a new index?

Please let me know your thoughts.

Thanks
Ashish

--
Benjamin DEVEZE


(system) #5