How to INNER JOIN documents from different indexes


(Matheus Vellone) #1

I have two different indexes:

  1. Persons
  2. Houses

Then, I need a relationship between them in a way that when I serach the houses it brings me the owner (the person associated) of the house. The inverse isn't required, but would be nice if possible.

I've searched about ElasticSearch relationships and found the Parent/Child one, that would solve the problem like a charm: Parent as the Persons and Child as the Houses.
But I read (and tested) that Parent/Child relationship requires both Parent and Child to be in the same index, and in my case I can't put all of them in the same index because that would be a mess due to all data residing in a single index.

In my tests I can create the indexes with types and add documents, but the error occurs when I try to search children with the message [has_parent] query configured 'parent_type' [parent] is not a valid type or [has_child] Type [child] points to a non existent parent type [parent]

Is there any way to accomplish this behavior using different indexes?


(Christian Dahlqvist) #2

The restriction around parent/child is actually even stricter than what you described - all related parents and children have to receive in the same shard within an index. There is no way to achieve this in Elasticsearch, so a common workaround is to perform a client side join in the application.

Depending on how frequently your data is updated, it may be worthwhile looking into de-normalizing the data and storing the two entities together.


(Matheus Vellone) #3

Hmm, got it.

A client side join can be done, but I'm looking for an option to this.

My Person data won't be updated to frequently, but the Houses will. In this scenario, the de-normalize approach still a good option ?
If so, this approach has any limitation or performance risk due to the number of relationships ? ie, in my case this relationship (1-N), the N number can be a high number, something like 1000, or even 100000.


#4

One way to map 1:N relationships in ElasticSearch is to use the Nested Datatype.

What you could do is create an index for a person with a nested attribute for houses...


(Matheus Vellone) #5

But I need them to reside in different indexes, or else there would exist only one index for all the documents


#6

You can maintain 3 separate indices, one for Person, one for Houses and another one for Person-HouseMap. This would add a lot of redundancy in your data though.

You won't be able to join the data from one index to another, so I don't know the exact reason behind why you would want to store the person and house data in a separate index...Is there a specific requirement that makes it necessary to store houses and persons in their own separate index?


(Matheus Vellone) #7

I'm looking for a way to avoid any data redundancy, so this isn't an option.

I'm just worried about performance (I may be wrong about that), because in my head separate indexes work better for indexing high volume of data because they're better separated.

I'm not 100% sure about that so please correct me if I'm wrong: If I put all data in a single index, wouldn't I have performance issues because all documents resides on the same index ? If not, I could set the child/parent relationship between the data and my problem is solved.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.