Multiple indices vs. routing

I'm building an app where I expect to index hundreds (maybe thousands) of
JSON documents every day per user. My question is about the trade-offs
between using an individual index per user and using the same index with a
different routing value per user.

Using the same index with the user id as a routing value is the simpler
solution, and seems like it will require less overhead because I won't be
creating a separate index for each user. However, it also means that my
single index will get very big and that may pose some problems as well. The
README suggests that using separate indices might be a better idea when
you're dealing with a large amount of data (see
https://github.com/elasticsearch/elasticsearch under the heading "Multi
Tenant – Indices and Types").

At what point does index size become such a problem that it makes sense to
have an index per user?

--
Michael Jackson
@mjackson

--

How many users are we talking about?

On Tuesday, January 15, 2013 3:07:55 PM UTC-5, Michael Jackson wrote:

I'm building an app where I expect to index hundreds (maybe thousands) of
JSON documents every day per user. My question is about the trade-offs
between using an individual index per user and using the same index with a
different routing value per user.

Using the same index with the user id as a routing value is the simpler
solution, and seems like it will require less overhead because I won't be
creating a separate index for each user. However, it also means that my
single index will get very big and that may pose some problems as well. The
README suggests that using separate indices might be a better idea when
you're dealing with a large amount of data (see
https://github.com/elasticsearch/elasticsearch under the heading "Multi
Tenant – Indices and Types").

At what point does index size become such a problem that it makes sense to
have an index per user?

--
Michael Jackson
@mjackson

--

At this point not many. But we're hoping it will grow as large as it can. :slight_smile:

A third option might be to start with just a single index with the
understanding that indexes cannot be modified once they are created. If we
do grow to the point where the single index is running too slowly we could
create a separate cluster that uses one index per user. Since we're using
the CouchDB river, it should be fairly easy to setup and modify all search
queries to point to the new cluster and indexes once it's ready.

--
Michael Jackson
@mjackson

On Tue, Jan 15, 2013 at 3:12 PM, Igor Motov imotov@gmail.com wrote:

How many users are we talking about?

On Tuesday, January 15, 2013 3:07:55 PM UTC-5, Michael Jackson wrote:

I'm building an app where I expect to index hundreds (maybe thousands) of
JSON documents every day per user. My question is about the trade-offs
between using an individual index per user and using the same index with a
different routing value per user.

Using the same index with the user id as a routing value is the simpler
solution, and seems like it will require less overhead because I won't be
creating a separate index for each user. However, it also means that my
single index will get very big and that may pose some problems as well. The
README suggests that using separate indices might be a better idea when
you're dealing with a large amount of data (see https://github.com/**
elasticsearch/elasticsearchhttps://github.com/elasticsearch/elasticsearchunder the heading "Multi Tenant – Indices and Types").

At what point does index size become such a problem that it makes sense
to have an index per user?

--
Michael Jackson
@mjackson

--

--

These are all viable options in different settings. Personally, I would
approach it this way. If most of the searches are done against a single
user index and I can afford it (I don't mind overhead and I don't expect to
have more than a few thousand users) I would go with one index per user. It
provides a nice separation, fast queries, users don't mess each others
IDFs, etc. However, if I know that my application will grow above a few
thousand users or if searches are done against entire index I would go with
some other strategy. I would consider partitioning by time or create
separate indices for large users and combine several small users into a
single index.

Have you seen this presentation by Shay
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html
? He is discussing this particular issue starting at 13:40.

On Tuesday, January 15, 2013 6:59:54 PM UTC-5, Michael Jackson wrote:

At this point not many. But we're hoping it will grow as large as it can.
:slight_smile:

A third option might be to start with just a single index with the
understanding that indexes cannot be modified once they are created. If we
do grow to the point where the single index is running too slowly we could
create a separate cluster that uses one index per user. Since we're using
the CouchDB river, it should be fairly easy to setup and modify all search
queries to point to the new cluster and indexes once it's ready.

--
Michael Jackson
@mjackson

On Tue, Jan 15, 2013 at 3:12 PM, Igor Motov <imo...@gmail.com<javascript:>

wrote:

How many users are we talking about?

On Tuesday, January 15, 2013 3:07:55 PM UTC-5, Michael Jackson wrote:

I'm building an app where I expect to index hundreds (maybe thousands)
of JSON documents every day per user. My question is about the trade-offs
between using an individual index per user and using the same index with a
different routing value per user.

Using the same index with the user id as a routing value is the simpler
solution, and seems like it will require less overhead because I won't be
creating a separate index for each user. However, it also means that my
single index will get very big and that may pose some problems as well. The
README suggests that using separate indices might be a better idea when
you're dealing with a large amount of data (see https://github.com/**
elasticsearch/elasticsearchhttps://github.com/elasticsearch/elasticsearchunder the heading "Multi Tenant – Indices and Types").

At what point does index size become such a problem that it makes sense
to have an index per user?

--
Michael Jackson
@mjackson

--

--

Thanks for the advice. That sounds about right.

No, I hadn't seen Shay's presentation but I will be sure and watch it
tonight. Thanks again!

--
Michael Jackson
@mjackson

On Tue, Jan 15, 2013 at 4:33 PM, Igor Motov imotov@gmail.com wrote:

These are all viable options in different settings. Personally, I would
approach it this way. If most of the searches are done against a single
user index and I can afford it (I don't mind overhead and I don't expect to
have more than a few thousand users) I would go with one index per user. It
provides a nice separation, fast queries, users don't mess each others
IDFs, etc. However, if I know that my application will grow above a few
thousand users or if searches are done against entire index I would go with
some other strategy. I would consider partitioning by time or create
separate indices for large users and combine several small users into a
single index.

Have you seen this presentation by Shay
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html? He is discussing this particular issue starting at 13:40.

On Tuesday, January 15, 2013 6:59:54 PM UTC-5, Michael Jackson wrote:

At this point not many. But we're hoping it will grow as large as it can.
:slight_smile:

A third option might be to start with just a single index with the
understanding that indexes cannot be modified once they are created. If we
do grow to the point where the single index is running too slowly we could
create a separate cluster that uses one index per user. Since we're using
the CouchDB river, it should be fairly easy to setup and modify all search
queries to point to the new cluster and indexes once it's ready.

--
Michael Jackson
@mjackson

On Tue, Jan 15, 2013 at 3:12 PM, Igor Motov imo...@gmail.com wrote:

How many users are we talking about?

On Tuesday, January 15, 2013 3:07:55 PM UTC-5, Michael Jackson wrote:

I'm building an app where I expect to index hundreds (maybe thousands)
of JSON documents every day per user. My question is about the trade-offs
between using an individual index per user and using the same index with a
different routing value per user.

Using the same index with the user id as a routing value is the simpler
solution, and seems like it will require less overhead because I won't be
creating a separate index for each user. However, it also means that my
single index will get very big and that may pose some problems as well. The
README suggests that using separate indices might be a better idea when
you're dealing with a large amount of data (see https://github.com/**
elasti**csearch/elasticsearchhttps://github.com/elasticsearch/elasticsearchunder the heading "Multi Tenant – Indices and Types").

At what point does index size become such a problem that it makes sense
to have an index per user?

--
Michael Jackson
@mjackson

--

--

--