Multi-tenancy performance

Hi

It seems that performance degrades pretty quickly when adding indexes,
although the twitter example would suggest that it's possible to
create a large number of indexes. Is there any way of speeding up
multiple indexes, or is this not recommended for per-user indexes
(we're talking millions of users)? After 400 indexes, it takes about 4
seconds to create a new index.

Kind regards,
Joakim

An index comes with an overhead when you create it. Actually, the main
overhead is a shard, and by default, an index is created with 5 shards with
1 replicas. Each shard is a lucene index, which does come with its own
overhead in terms of memory requirements and os resources (mainly file
handles).

Talking about millions of users, I would suggest against creating an index
per user.

-shay.banon

On Mon, Jan 3, 2011 at 8:43 PM, recht jrecht@gmail.com wrote:

Hi

It seems that performance degrades pretty quickly when adding indexes,
although the twitter example would suggest that it's possible to
create a large number of indexes. Is there any way of speeding up
multiple indexes, or is this not recommended for per-user indexes
(we're talking millions of users)? After 400 indexes, it takes about 4
seconds to create a new index.

Kind regards,
Joakim

Shay,
Are there any guidelines you can give (recommended upper limit) to number of
indexes per cluster?

  • Rich Kroll

On Tue, Jan 4, 2011 at 11:40 AM, Shay Banon shay.banon@elasticsearch.comwrote:

An index comes with an overhead when you create it. Actually, the main
overhead is a shard, and by default, an index is created with 5 shards with
1 replicas. Each shard is a lucene index, which does come with its own
overhead in terms of memory requirements and os resources (mainly file
handles).

Talking about millions of users, I would suggest against creating an index
per user.

-shay.banon

On Mon, Jan 3, 2011 at 8:43 PM, recht jrecht@gmail.com wrote:

Hi

It seems that performance degrades pretty quickly when adding indexes,
although the twitter example would suggest that it's possible to
create a large number of indexes. Is there any way of speeding up
multiple indexes, or is this not recommended for per-user indexes
(we're talking millions of users)? After 400 indexes, it takes about 4
seconds to create a new index.

Kind regards,
Joakim

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein

It really depends on the hardware you have, memory allocate to ES process.
Maybe other people can give explicit examples with some numbers, but I
suggest you do a quick capacity tests with what you have,
and extrapolate based on that.

On Tue, Jan 4, 2011 at 6:44 PM, Rich Kroll kroll.rich@gmail.com wrote:

Shay,
Are there any guidelines you can give (recommended upper limit) to number
of indexes per cluster?

  • Rich Kroll

On Tue, Jan 4, 2011 at 11:40 AM, Shay Banon shay.banon@elasticsearch.comwrote:

An index comes with an overhead when you create it. Actually, the main
overhead is a shard, and by default, an index is created with 5 shards with
1 replicas. Each shard is a lucene index, which does come with its own
overhead in terms of memory requirements and os resources (mainly file
handles).

Talking about millions of users, I would suggest against creating an index
per user.

-shay.banon

On Mon, Jan 3, 2011 at 8:43 PM, recht jrecht@gmail.com wrote:

Hi

It seems that performance degrades pretty quickly when adding indexes,
although the twitter example would suggest that it's possible to
create a large number of indexes. Is there any way of speeding up
multiple indexes, or is this not recommended for per-user indexes
(we're talking millions of users)? After 400 indexes, it takes about 4
seconds to create a new index.

Kind regards,
Joakim

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein

I'd recommend using the shard routing behavior for per user data
buckets. This keeps the shard count bounded and ends up only searching
the necessary shard. In order to re-shard, you do need to rebuild
content, though.

On Jan 4, 9:46 am, Shay Banon shay.ba...@elasticsearch.com wrote:

It really depends on the hardware you have, memory allocate to ES process.
Maybe other people can give explicit examples with some numbers, but I
suggest you do a quick capacity tests with what you have,
and extrapolate based on that.

On Tue, Jan 4, 2011 at 6:44 PM, Rich Kroll kroll.r...@gmail.com wrote:

Shay,
Are there any guidelines you can give (recommended upper limit) to number
of indexes per cluster?

  • Rich Kroll

On Tue, Jan 4, 2011 at 11:40 AM, Shay Banon shay.ba...@elasticsearch.comwrote:

An index comes with an overhead when you create it. Actually, the main
overhead is a shard, and by default, an index is created with 5 shards with
1 replicas. Each shard is a lucene index, which does come with its own
overhead in terms of memory requirements and os resources (mainly file
handles).

Talking about millions of users, I would suggest against creating an index
per user.

-shay.banon

On Mon, Jan 3, 2011 at 8:43 PM, recht jre...@gmail.com wrote:

Hi

It seems that performance degrades pretty quickly when adding indexes,
although the twitter example would suggest that it's possible to
create a large number of indexes. Is there any way of speeding up
multiple indexes, or is this not recommended for per-user indexes
(we're talking millions of users)? After 400 indexes, it takes about 4
seconds to create a new index.

Kind regards,
Joakim

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein

And to answer your other question, you want to keep to keep the number
of indexes reasonably bounded. This is just my rough back of the
envelope number, but I'd try to keep the shard count below 100 for a
beefy server. These shards can get quite large, though, as long as you
have the RAM and disk available and I am sure you can push the count
much much higher with proper tuning.

On Jan 4, 2:33 pm, Paul ppea...@gmail.com wrote:

I'd recommend using the shard routing behavior for per user data
buckets. This keeps the shard count bounded and ends up only searching
the necessary shard. In order to re-shard, you do need to rebuild
content, though.

On Jan 4, 9:46 am, Shay Banon shay.ba...@elasticsearch.com wrote:

It really depends on the hardware you have, memory allocate to ES process.
Maybe other people can give explicit examples with some numbers, but I
suggest you do a quick capacity tests with what you have,
and extrapolate based on that.

On Tue, Jan 4, 2011 at 6:44 PM, Rich Kroll kroll.r...@gmail.com wrote:

Shay,
Are there any guidelines you can give (recommended upper limit) to number
of indexes per cluster?

  • Rich Kroll

On Tue, Jan 4, 2011 at 11:40 AM, Shay Banon shay.ba...@elasticsearch.comwrote:

An index comes with an overhead when you create it. Actually, the main
overhead is a shard, and by default, an index is created with 5 shards with
1 replicas. Each shard is a lucene index, which does come with its own
overhead in terms of memory requirements and os resources (mainly file
handles).

Talking about millions of users, I would suggest against creating an index
per user.

-shay.banon

On Mon, Jan 3, 2011 at 8:43 PM, recht jre...@gmail.com wrote:

Hi

It seems that performance degrades pretty quickly when adding indexes,
although the twitter example would suggest that it's possible to
create a large number of indexes. Is there any way of speeding up
multiple indexes, or is this not recommended for per-user indexes
(we're talking millions of users)? After 400 indexes, it takes about 4
seconds to create a new index.

Kind regards,
Joakim

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein

Yea, routing is a great use case for user base "index". Just make sure you
provide the user id as the routing value when you index, and when you search
(so only one shard will be queried), on top of the userid filtered query.

One thing regarding the need to reindex, with clever index placement, you
can work around that. By aliasing indices with the username, you can
dynamically allocate more indices on the fly and assign new users to them.
Rollover to a new index when you have, for example, N numbers of users
assigned to that index. On the "front end", all the code interacts with the
aliases, so it feels as if each user has its own index, except for the fact
that user id routing and wrapping of any query executed in the context of a
user with a filtered query with user id. That can be easily done with a nice
"search layer" in the app that will automatically wrap any query with a
filtered query, and provide the routing value.

On Tue, Jan 4, 2011 at 11:33 PM, Paul ppearcy@gmail.com wrote:

I'd recommend using the shard routing behavior for per user data
buckets. This keeps the shard count bounded and ends up only searching
the necessary shard. In order to re-shard, you do need to rebuild
content, though.

On Jan 4, 9:46 am, Shay Banon shay.ba...@elasticsearch.com wrote:

It really depends on the hardware you have, memory allocate to ES
process.
Maybe other people can give explicit examples with some numbers, but I
suggest you do a quick capacity tests with what you have,
and extrapolate based on that.

On Tue, Jan 4, 2011 at 6:44 PM, Rich Kroll kroll.r...@gmail.com wrote:

Shay,
Are there any guidelines you can give (recommended upper limit) to
number
of indexes per cluster?

  • Rich Kroll

On Tue, Jan 4, 2011 at 11:40 AM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

An index comes with an overhead when you create it. Actually, the main
overhead is a shard, and by default, an index is created with 5 shards
with
1 replicas. Each shard is a lucene index, which does come with its own
overhead in terms of memory requirements and os resources (mainly file
handles).

Talking about millions of users, I would suggest against creating an
index
per user.

-shay.banon

On Mon, Jan 3, 2011 at 8:43 PM, recht jre...@gmail.com wrote:

Hi

It seems that performance degrades pretty quickly when adding
indexes,
although the twitter example would suggest that it's possible to
create a large number of indexes. Is there any way of speeding up
multiple indexes, or is this not recommended for per-user indexes
(we're talking millions of users)? After 400 indexes, it takes about
4
seconds to create a new index.

Kind regards,
Joakim

--
“We can't solve problems by using the same kind of thinking we used
when we
created them.” ~ Albert Einstein