Faceted search on multiple indices. Is it even possible?

Hi,

We're trying to use Elasticsearch in a project with a fancy use case.

We have some documents that are sometimes updated, but that are considered
"static". Let's call them "main documents". They have their own set of
attributes (ex : author, created_at, category, …).

We have some other documents that are indexed as they are produced. Let's
call them children documents. They are related to the main documents (with
a main_doc_id) and also depend on a user's context (user_id). Imagine they
are metadata about the parent documents, but specific to the user. Example
attributes : favorite (boolean), rating, tags, …

We'd like to do a faceted search on the main documents, but those documents
have to be filtered by attributes of the main document and of the children
documents. An example search : I want main documents by author Bob, in the
"sci-fi" category, that I have rated over 5/10. I would like to have the
main documents returned, and all the facets.

If we use only one index, with 2 types of documents, and a "has child"
relation between them, we can do all of this.

But (there is always a "but") it doesn't scale !

The "main" documents are quite stable and shared by all users request, so
they would ideally be in their own index.
The "children" documents are very volatile (even if they doesn't seem to
be, in my example) ; they change a lot, and become irrelevant after a few
hours. They would have been great candidates for a daily rolled index,
accessed by aliases and routing.

The problem is that the faceted search doesn't work with separate indices,
because the has_child needs a single index (as far as I understand).

Does that ring a bell to any of you?
Do I try to do something crazy?

Thanks for any advice.

--
Jeremy

--

Yea, you can't combine rolling indices with static "parents". Its either making sure each rolling index has all the parents, or having enough shards so that as you add more children (for a single parent) you won't need a rolling index. No magic answers here sadly...

On Jan 25, 2013, at 4:51 PM, Jérémy Lecour jeremy.lecour@gmail.com wrote:

Hi,

We're trying to use Elasticsearch in a project with a fancy use case.

We have some documents that are sometimes updated, but that are considered "static". Let's call them "main documents". They have their own set of attributes (ex : author, created_at, category, …).

We have some other documents that are indexed as they are produced. Let's call them children documents. They are related to the main documents (with a main_doc_id) and also depend on a user's context (user_id). Imagine they are metadata about the parent documents, but specific to the user. Example attributes : favorite (boolean), rating, tags, …

We'd like to do a faceted search on the main documents, but those documents have to be filtered by attributes of the main document and of the children documents. An example search : I want main documents by author Bob, in the "sci-fi" category, that I have rated over 5/10. I would like to have the main documents returned, and all the facets.

If we use only one index, with 2 types of documents, and a "has child" relation between them, we can do all of this.

But (there is always a "but") it doesn't scale !

The "main" documents are quite stable and shared by all users request, so they would ideally be in their own index.
The "children" documents are very volatile (even if they doesn't seem to be, in my example) ; they change a lot, and become irrelevant after a few hours. They would have been great candidates for a daily rolled index, accessed by aliases and routing.

The problem is that the faceted search doesn't work with separate indices, because the has_child needs a single index (as far as I understand).

Does that ring a bell to any of you?
Do I try to do something crazy?

Thanks for any advice.

--
Jeremy

--

--