Design guidance for multi-tenant multi-source indexing

Ashwin_Sathya · September 6, 2013, 3:02pm

Hi,

I am detailing my scenario here.

I need to support a number of users who have data from multiple sources. The nature of the data source is chronological, it is both real-time and timestamp'ed.
The search capability that I need to provide the user is to search over particular source and over particular time range (say a few hours to few days)

I am not able to grasp/map the index concepts to how I will design my data layer. Any suggestions/guidance ?

Thanks,
Ashwin Sathya

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · September 6, 2013, 8:20pm

I would probably build time-based indexes. For example, an index per day
(or week, or hour...whatever unit of time seems appropriate for your
setup). Your documents would then contain both a timestamp field and a
source field.

When a user searches a time range, you can search only the range of indexes
that match the requested time. E.g. if you store daily indexes and your
user requests all values in the last two days, you perform a search on just
those two indexes. It is very easy to search over multiple indexes at the
same time, you simply concatenate them together in the URI with a comma:

curl -XGET localhost:9200/data_09_06_2013,data09_05_2013/_search -d '{}'

Now, since you have multiple users sharing the same index, you perform a
filtered_query so that results are filtered by the source field. If you
need finer control on time ranges (e.g. a particular hour in a particular
day) you can just include a Range filter along with the term filter on the
source field

Make sense?
-Zach

On Friday, September 6, 2013 11:02:48 AM UTC-4, R Ashwin Sathya wrote:

Hi,

I am detailing my scenario here.

I need to support a number of users who have data from multiple sources.
The nature of the data source is chronological, it is both real-time and
timestamp'ed.
The search capability that I need to provide the user is to search over
particular source and over particular time range (say a few hours to few
days)

I am not able to grasp/map the index concepts to how I will design my data
layer. Any suggestions/guidance ?

Thanks,
Ashwin Sathya

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ashwin_Sathya · September 6, 2013, 8:29pm

Thanks Zachary,

Timed indexes would fit perfect for my scenario. Particularly for modelling costs (say indexes older than N days will be deleted and things like that)
The only downside of this I see is that, it won't be easier to recover the indexes by user that easily. For example, if I isolate the user data into separate indexes, I can configure backups for particular users and restore them at will. In the shared case, I will have to selectively backup ?

I am also having to test against another parallel search technology, for just experimental purposes.
The setup there is as follows.
1 Shard -> Mapped over 1 master and 2 replica nodes
30 users -> Each user has an index
2 Types -> Two types of document data (Type1, Type2)
30 days worth of data -> I am basically having to accommodate them in the same index. I am quite unsure as to how I can achieve this parity. In my other search system, we have a concept of clear tables, so i have named my tables as Type1_Date1, Type2_Date1, ... and so on. An equivalent would be create the table names as types right ?

From what I am learning from ES, I understand that the above is highly under utilizing the true power of ES to scale. But as I mentioned, it is for benchmarking and other purposes.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:20:41 -0700
From: zacharyjtong@gmail.com
To: elasticsearch@googlegroups.com
CC: ashwin.sathya@outlook.com
Subject: Re: Design guidance for multi-tenant multi-source indexing

I would probably build time-based indexes. For example, an index per day (or week, or hour...whatever unit of time seems appropriate for your setup). Your documents would then contain both a timestamp field and a source field.
When a user searches a time range, you can search only the range of indexes that match the requested time. E.g. if you store daily indexes and your user requests all values in the last two days, you perform a search on just those two indexes. It is very easy to search over multiple indexes at the same time, you simply concatenate them together in the URI with a comma:
curl -XGET localhost:9200/data_09_06_2013,data09_05_2013/_search -d '{}'
Now, since you have multiple users sharing the same index, you perform a filtered_query so that results are filtered by the source field. If you need finer control on time ranges (e.g. a particular hour in a particular day) you can just include a Range filter along with the term filter on the source field
Make sense?-Zach

On Friday, September 6, 2013 11:02:48 AM UTC-4, R Ashwin Sathya wrote:

Hi,

I am detailing my scenario here.

I need to support a number of users who have data from multiple sources. The nature of the data source is chronological, it is both real-time and timestamp'ed.
The search capability that I need to provide the user is to search over particular source and over particular time range (say a few hours to few days)

I am not able to grasp/map the index concepts to how I will design my data layer. Any suggestions/guidance ?

Thanks,
Ashwin Sathya

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · September 6, 2013, 8:46pm

Yep, you're correct - if you wanted to do backups of a particular user,
you'll have to implement a selective backup process that pulls their data
out of the index. You could do this fairly easily with a Scan/Scroll API
call and a filtered query. Restoring data will be a little more painful,
since you will probably have to perform a Delete-By-Query and then reindex
the data.

You can also do an index-per-user, especially if you know you will have at
most 30 users (or some other relatively small number, e.g. at most
hundreds, not thousands). If each user has their own index, you can
internally partition the data however you like. Searching between a bunch
of types (Type1_Date1, etc) is equivalent to searching one type and
applying a filter on a date field. Internally types are managed by filters
on "special" fields, so the process/performance is basically identical.

A perk to doing index-per-type is that you can scale individual indexes to
meet the needs of individual users. So if one user is very large and
requires a lot of capacity, you can provide their index with 10 shards,
while another user only needs 2 shards for their index. A downside is that
removing old data will be more expensive, since deleting documents
individually is much slower than dropping an entire index. Another
disadvantage is that you are somewhat limited in the number of users you
can add - at some point adding more indexes becomes too much overhead.

Shay has a very good talk describing two "data flows" - user data flow and
time-based data flow - which you may find helpful: http://vimeo.com/44716955

-Zach

On Friday, September 6, 2013 4:29:11 PM UTC-4, R Ashwin Sathya wrote:

Thanks Zachary,

Timed indexes would fit perfect for my scenario. Particularly for
modelling costs (say indexes older than N days will be deleted and things
like that)
The only downside of this I see is that, it won't be easier to recover the
indexes by user that easily. For example, if I isolate the user data into
separate indexes, I can configure backups for particular users and restore
them at will. In the shared case, I will have to selectively backup ?

I am also having to test against another parallel search technology, for
just experimental purposes.
The setup there is as follows.
1 Shard -> Mapped over 1 master and 2 replica nodes
30 users -> Each user has an index
2 Types -> Two types of document data (Type1, Type2)
30 days worth of data -> I am basically having to accommodate them in the
same index. I am quite unsure as to how I can achieve this parity. In my
other search system, we have a concept of clear tables, so i have named my
tables as Type1_Date1, Type2_Date1, ... and so on. An equivalent would be
create the table names as types right ?

From what I am learning from ES, I understand that the above is highly
under utilizing the true power of ES to scale. But as I mentioned, it is
for benchmarking and other purposes.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:20:41 -0700
From: zachar...@gmail.com <javascript:>
To: elasti...@googlegroups.com <javascript:>
CC: ashwin...@outlook.com <javascript:>
Subject: Re: Design guidance for multi-tenant multi-source indexing

I would probably build time-based indexes. For example, an index per day
(or week, or hour...whatever unit of time seems appropriate for your
setup). Your documents would then contain both a timestamp field and a
source field.

When a user searches a time range, you can search only the range of
indexes that match the requested time. E.g. if you store daily indexes and
your user requests all values in the last two days, you perform a search on
just those two indexes. It is very easy to search over multiple indexes at
the same time, you simply concatenate them together in the URI with a comma:

curl -XGET localhost:9200/data_09_06_2013,data09_05_2013/_search -d '{}'

Now, since you have multiple users sharing the same index, you perform a
filtered_query so that results are filtered by the source field. If you
need finer control on time ranges (e.g. a particular hour in a particular
day) you can just include a Range filter along with the term filter on the
source field

Make sense?
-Zach

On Friday, September 6, 2013 11:02:48 AM UTC-4, R Ashwin Sathya wrote:

Hi,

I am detailing my scenario here.

I need to support a number of users who have data from multiple sources.
The nature of the data source is chronological, it is both real-time and
timestamp'ed.
The search capability that I need to provide the user is to search over
particular source and over particular time range (say a few hours to few
days)

I am not able to grasp/map the index concepts to how I will design my data
layer. Any suggestions/guidance ?

Thanks,
Ashwin Sathya

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ashwin_Sathya · September 6, 2013, 8:57pm

Great, thanks for the detailed explanation Zachary.

As I said, I am not looking into splitting the data across shards for my experimental purposes.

However, going by ES docs and the data I can read up, scaling down and scaling up based on the load that each user puts (both read and write) seems to be a well understood problem and I will think about how I need to model my system. I will definitely look into the talk before I proceed for such serious thinking.

Thanks again for the help.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:46:54 -0700
From: zacharyjtong@gmail.com
To: elasticsearch@googlegroups.com
CC: ashwin.sathya@outlook.com
Subject: Re: Design guidance for multi-tenant multi-source indexing

Yep, you're correct - if you wanted to do backups of a particular user, you'll have to implement a selective backup process that pulls their data out of the index. You could do this fairly easily with a Scan/Scroll API call and a filtered query. Restoring data will be a little more painful, since you will probably have to perform a Delete-By-Query and then reindex the data.
You can also do an index-per-user, especially if you know you will have at most 30 users (or some other relatively small number, e.g. at most hundreds, not thousands). If each user has their own index, you can internally partition the data however you like. Searching between a bunch of types (Type1_Date1, etc) is equivalent to searching one type and applying a filter on a date field. Internally types are managed by filters on "special" fields, so the process/performance is basically identical.
A perk to doing index-per-type is that you can scale individual indexes to meet the needs of individual users. So if one user is very large and requires a lot of capacity, you can provide their index with 10 shards, while another user only needs 2 shards for their index. A downside is that removing old data will be more expensive, since deleting documents individually is much slower than dropping an entire index. Another disadvantage is that you are somewhat limited in the number of users you can add - at some point adding more indexes becomes too much overhead.
Shay has a very good talk describing two "data flows" - user data flow and time-based data flow - which you may find helpful: http://vimeo.com/44716955
-Zach

On Friday, September 6, 2013 4:29:11 PM UTC-4, R Ashwin Sathya wrote:

Thanks Zachary,

Timed indexes would fit perfect for my scenario. Particularly for modelling costs (say indexes older than N days will be deleted and things like that)
The only downside of this I see is that, it won't be easier to recover the indexes by user that easily. For example, if I isolate the user data into separate indexes, I can configure backups for particular users and restore them at will. In the shared case, I will have to selectively backup ?

I am also having to test against another parallel search technology, for just experimental purposes.
The setup there is as follows.
1 Shard -> Mapped over 1 master and 2 replica nodes
30 users -> Each user has an index
2 Types -> Two types of document data (Type1, Type2)
30 days worth of data -> I am basically having to accommodate them in the same index. I am quite unsure as to how I can achieve this parity. In my other search system, we have a concept of clear tables, so i have named my tables as Type1_Date1, Type2_Date1, ... and so on. An equivalent would be create the table names as types right ?

From what I am learning from ES, I understand that the above is highly under utilizing the true power of ES to scale. But as I mentioned, it is for benchmarking and other purposes.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:20:41 -0700
From: zachar...@gmail.com
To: elasti...@googlegroups.com
CC: ashwin...@outlook.com
Subject: Re: Design guidance for multi-tenant multi-source indexing

I would probably build time-based indexes. For example, an index per day (or week, or hour...whatever unit of time seems appropriate for your setup). Your documents would then contain both a timestamp field and a source field.
When a user searches a time range, you can search only the range of indexes that match the requested time. E.g. if you store daily indexes and your user requests all values in the last two days, you perform a search on just those two indexes. It is very easy to search over multiple indexes at the same time, you simply concatenate them together in the URI with a comma:
curl -XGET localhost:9200/data_09_06_2013,data09_05_2013/_search -d '{}'
Now, since you have multiple users sharing the same index, you perform a filtered_query so that results are filtered by the source field. If you need finer control on time ranges (e.g. a particular hour in a particular day) you can just include a Range filter along with the term filter on the source field
Make sense?-Zach

On Friday, September 6, 2013 11:02:48 AM UTC-4, R Ashwin Sathya wrote:

Hi,

I am detailing my scenario here.

I need to support a number of users who have data from multiple sources. The nature of the data source is chronological, it is both real-time and timestamp'ed.
The search capability that I need to provide the user is to search over particular source and over particular time range (say a few hours to few days)

I am not able to grasp/map the index concepts to how I will design my data layer. Any suggestions/guidance ?

Thanks,
Ashwin Sathya

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · September 6, 2013, 9:18pm

No problem. Let me know if you have any more questions. =)

The only real constant to consider when thinking about scaling (no matter
how you organize your data) is the number of primary shards, since this
cannot be changed once the index is created. Everything else is very
flexible. And even the primary shard situation can be changed if you
re-index your data into a new index that has been provisioned with more
shards.

-Zach

On Friday, September 6, 2013 4:57:51 PM UTC-4, R Ashwin Sathya wrote:

Great, thanks for the detailed explanation Zachary.

As I said, I am not looking into splitting the data across shards for my
experimental purposes.

However, going by ES docs and the data I can read up, scaling down and
scaling up based on the load that each user puts (both read and write)
seems to be a well understood problem and I will think about how I need to
model my system. I will definitely look into the talk before I proceed for
such serious thinking.

Thanks again for the help.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:46:54 -0700
From: zachar...@gmail.com <javascript:>
To: elasti...@googlegroups.com <javascript:>
CC: ashwin...@outlook.com <javascript:>
Subject: Re: Design guidance for multi-tenant multi-source indexing

Yep, you're correct - if you wanted to do backups of a particular user,
you'll have to implement a selective backup process that pulls their data
out of the index. You could do this fairly easily with a Scan/Scroll API
call and a filtered query. Restoring data will be a little more painful,
since you will probably have to perform a Delete-By-Query and then reindex
the data.

You can also do an index-per-user, especially if you know you will have at
most 30 users (or some other relatively small number, e.g. at most
hundreds, not thousands). If each user has their own index, you can
internally partition the data however you like. Searching between a bunch
of types (Type1_Date1, etc) is equivalent to searching one type and
applying a filter on a date field. Internally types are managed by filters
on "special" fields, so the process/performance is basically identical.

A perk to doing index-per-type is that you can scale individual indexes to
meet the needs of individual users. So if one user is very large and
requires a lot of capacity, you can provide their index with 10 shards,
while another user only needs 2 shards for their index. A downside is that
removing old data will be more expensive, since deleting documents
individually is much slower than dropping an entire index. Another
disadvantage is that you are somewhat limited in the number of users you
can add - at some point adding more indexes becomes too much overhead.

Shay has a very good talk describing two "data flows" - user data flow and
time-based data flow - which you may find helpful:
http://vimeo.com/44716955

-Zach

On Friday, September 6, 2013 4:29:11 PM UTC-4, R Ashwin Sathya wrote:

Thanks Zachary,

Timed indexes would fit perfect for my scenario. Particularly for
modelling costs (say indexes older than N days will be deleted and things
like that)
The only downside of this I see is that, it won't be easier to recover the
indexes by user that easily. For example, if I isolate the user data into
separate indexes, I can configure backups for particular users and restore
them at will. In the shared case, I will have to selectively backup ?

I am also having to test against another parallel search technology, for
just experimental purposes.
The setup there is as follows.
1 Shard -> Mapped over 1 master and 2 replica nodes
30 users -> Each user has an index
2 Types -> Two types of document data (Type1, Type2)
30 days worth of data -> I am basically having to accommodate them in the
same index. I am quite unsure as to how I can achieve this parity. In my
other search system, we have a concept of clear tables, so i have named my
tables as Type1_Date1, Type2_Date1, ... and so on. An equivalent would be
create the table names as types right ?

From what I am learning from ES, I understand that the above is highly
under utilizing the true power of ES to scale. But as I mentioned, it is
for benchmarking and other purposes.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:20:41 -0700
From: zachar...@gmail.com
To: elasti...@googlegroups.com
CC: ashwin...@outlook.com
Subject: Re: Design guidance for multi-tenant multi-source indexing

I would probably build time-based indexes. For example, an index per day
(or week, or hour...whatever unit of time seems appropriate for your
setup). Your documents would then contain both a timestamp field and a
source field.

When a user searches a time range, you can search only the range of
indexes that match the requested time. E.g. if you store daily indexes and
your user requests all values in the last two days, you perform a search on
just those two indexes. It is very easy to search over multiple indexes at
the same time, you simply concatenate them together in the URI with a comma:

curl -XGET localhost:9200/data_09_06_2013,data09_05_2013/_search -d '{}'

Now, since you have multiple users sharing the same index, you perform a
filtered_query so that results are filtered by the source field. If you
need finer control on time ranges (e.g. a particular hour in a particular
day) you can just include a Range filter along with the term filter on the
source field

Make sense?
-Zach

On Friday, September 6, 2013 11:02:48 AM UTC-4, R Ashwin Sathya wrote:

Hi,

I am detailing my scenario here.

I need to support a number of users who have data from multiple sources.
The nature of the data source is chronological, it is both real-time and
timestamp'ed.
The search capability that I need to provide the user is to search over
particular source and over particular time range (say a few hours to few
days)

I am not able to grasp/map the index concepts to how I will design my data
layer. Any suggestions/guidance ?

Thanks,
Ashwin Sathya

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ashwin_Sathya · September 6, 2013, 9:19pm

Great, Thanks

I will definitely keep that in mind.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 14:18:31 -0700
From: zacharyjtong@gmail.com
To: elasticsearch@googlegroups.com
CC: ashwin.sathya@outlook.com
Subject: Re: Design guidance for multi-tenant multi-source indexing

No problem. Let me know if you have any more questions. =)
The only real constant to consider when thinking about scaling (no matter how you organize your data) is the number of primary shards, since this cannot be changed once the index is created. Everything else is very flexible. And even the primary shard situation can be changed if you re-index your data into a new index that has been provisioned with more shards.
-Zach

On Friday, September 6, 2013 4:57:51 PM UTC-4, R Ashwin Sathya wrote:

Great, thanks for the detailed explanation Zachary.

As I said, I am not looking into splitting the data across shards for my experimental purposes.

However, going by ES docs and the data I can read up, scaling down and scaling up based on the load that each user puts (both read and write) seems to be a well understood problem and I will think about how I need to model my system. I will definitely look into the talk before I proceed for such serious thinking.

Thanks again for the help.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:46:54 -0700
From: zachar...@gmail.com
To: elasti...@googlegroups.com
CC: ashwin...@outlook.com
Subject: Re: Design guidance for multi-tenant multi-source indexing

Yep, you're correct - if you wanted to do backups of a particular user, you'll have to implement a selective backup process that pulls their data out of the index. You could do this fairly easily with a Scan/Scroll API call and a filtered query. Restoring data will be a little more painful, since you will probably have to perform a Delete-By-Query and then reindex the data.
You can also do an index-per-user, especially if you know you will have at most 30 users (or some other relatively small number, e.g. at most hundreds, not thousands). If each user has their own index, you can internally partition the data however you like. Searching between a bunch of types (Type1_Date1, etc) is equivalent to searching one type and applying a filter on a date field. Internally types are managed by filters on "special" fields, so the process/performance is basically identical.
A perk to doing index-per-type is that you can scale individual indexes to meet the needs of individual users. So if one user is very large and requires a lot of capacity, you can provide their index with 10 shards, while another user only needs 2 shards for their index. A downside is that removing old data will be more expensive, since deleting documents individually is much slower than dropping an entire index. Another disadvantage is that you are somewhat limited in the number of users you can add - at some point adding more indexes becomes too much overhead.
Shay has a very good talk describing two "data flows" - user data flow and time-based data flow - which you may find helpful: http://vimeo.com/44716955
-Zach

On Friday, September 6, 2013 4:29:11 PM UTC-4, R Ashwin Sathya wrote:

Thanks Zachary,

Timed indexes would fit perfect for my scenario. Particularly for modelling costs (say indexes older than N days will be deleted and things like that)
The only downside of this I see is that, it won't be easier to recover the indexes by user that easily. For example, if I isolate the user data into separate indexes, I can configure backups for particular users and restore them at will. In the shared case, I will have to selectively backup ?

I am also having to test against another parallel search technology, for just experimental purposes.
The setup there is as follows.
1 Shard -> Mapped over 1 master and 2 replica nodes
30 users -> Each user has an index
2 Types -> Two types of document data (Type1, Type2)
30 days worth of data -> I am basically having to accommodate them in the same index. I am quite unsure as to how I can achieve this parity. In my other search system, we have a concept of clear tables, so i have named my tables as Type1_Date1, Type2_Date1, ... and so on. An equivalent would be create the table names as types right ?

From what I am learning from ES, I understand that the above is highly under utilizing the true power of ES to scale. But as I mentioned, it is for benchmarking and other purposes.

Thanks,
Ashwin Sathya

Date: Fri, 6 Sep 2013 13:20:41 -0700
From: zachar...@gmail.com
To: elasti...@googlegroups.com
CC: ashwin...@outlook.com
Subject: Re: Design guidance for multi-tenant multi-source indexing

I would probably build time-based indexes. For example, an index per day (or week, or hour...whatever unit of time seems appropriate for your setup). Your documents would then contain both a timestamp field and a source field.
When a user searches a time range, you can search only the range of indexes that match the requested time. E.g. if you store daily indexes and your user requests all values in the last two days, you perform a search on just those two indexes. It is very easy to search over multiple indexes at the same time, you simply concatenate them together in the URI with a comma:
curl -XGET localhost:9200/data_09_06_2013,data09_05_2013/_search -d '{}'
Now, since you have multiple users sharing the same index, you perform a filtered_query so that results are filtered by the source field. If you need finer control on time ranges (e.g. a particular hour in a particular day) you can just include a Range filter along with the term filter on the source field
Make sense?-Zach

On Friday, September 6, 2013 11:02:48 AM UTC-4, R Ashwin Sathya wrote:

Hi,

I am detailing my scenario here.

I need to support a number of users who have data from multiple sources. The nature of the data source is chronological, it is both real-time and timestamp'ed.
The search capability that I need to provide the user is to search over particular source and over particular time range (say a few hours to few days)

I am not able to grasp/map the index concepts to how I will design my data layer. Any suggestions/guidance ?

Thanks,
Ashwin Sathya

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Another multitenant scenario Elasticsearch	1	383	July 6, 2017
Management of indexes Elasticsearch	2	281	July 6, 2017
Multi-field search question? Elasticsearch	6	294	July 6, 2017
New to elasticsearch and wondering how to use it for a mult-tenant invironment Elasticsearch	3	327	July 6, 2017
Multy-tenany elasticsearch Elasticsearch	1	151	December 21, 2023

Design guidance for multi-tenant multi-source indexing

Related topics