Elasticsearch 6.x Join behaviour

plexaikm · October 24, 2017, 3:14pm

Hello

Reading the v6.x documentation,
preparing for migration from 2.4/5.5 to 6.x later on,

In some indices we currently have multiple types (already something to change) and parent-child relations between different types in the same index...

So given indices will no longer support multiple types per index, and parent/child is changing, stumbled upon this documentation: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/parent-join.html

One line which i quite puzzled about is (yes its the first line...): The join datatype is a special field that creates parent/child relation within documents of the same index.

Now this is not a replacement for the current parent child relation which support different types, and because different types can no longer be held on same index, i cant use parent child between different types anymore?

Or am i missing something?

lwintergerst · October 24, 2017, 5:34pm

Hello,
Elasticsearch 6.x will only allow you to have a single document type per index.
The new join datatype will allow you to still model parent-child relationships in your data with a single document type.

What does this mean for you?
You will have to migrate to a single type, but no functionality will be lost with the new join functionality. You will still be able to do all queries that you can do right now.

Does this answer your question?

Luca

plexaikm · October 25, 2017, 1:24pm

Hello,
Yes, i understand the suggested approach (single index, single type, merge fields from types we currently have parent-child relations)

It will allow us to create fields of all "logical" types even if we use strict mapping

Is there a future planning for join between documents on different indices? (yes i'm aware that current parent-child works due to co-location of the document on the same shard via routing and joining between different indices is significantly harder in distributed parallel system )

In the past there was a plugin (siren join) providing this functionality, currently unsupported (developers concentrated on wider commercial solution)

Mark_Harwood · October 25, 2017, 1:41pm

Query-time joins across networks will always be expensive and so we don't support it.

If I recall correctly their approach to scaling joins with large numbers of IDs was to save space by sending hashes rather than full ID strings and join only using those. In Java this would be the equivalent of a HashMap based on objects that implemented hashCode() but not equals() - it would be fast but (scarily) you have the potential for false positives.
Sending a lot of data over a network is slow and physics is a tough thing to beat.

varunnatraaj · October 26, 2017, 2:06pm

My concern is also similar along these lines where type allowed some sort of abstracted in cases of conflicts.

Let me elaborate on a bit from the following unanswered question of mine linked below:

Assume that we store stackoverflow data on an index. So the appropriate minimal schema would be:

Question:

Title
Content

Answers:

Content

In JSON, this would look something like:

{
    "mappings":{
        "question": {
            // Title, content mappings
        },
        "answer": {
            "_parent": {
                "type": "question"
            }
        }
     }
}

So this allowed me to write separate non-conflicting docs to each type:

curl -XPUT localhost:9200/so_index/question/1?routing=1 -d '{"title": "..", "content": ".."}'
curl -XPUT localhost:9200/so_index/answer/1?routing=1&parent=1 -d '{"content": ".."}'

So with the new implementation, not only I'm forced to map the "child type" to a parent (which will be removed in ES 7 of course), I've to also resolve _id 1 of parent to not conflict with _id 1 of child. So when writing a child, like shown below, it would actually overwrite the parent if the _id is same.

// Writing PARENT
curl -XPUT localhost:9200/so_index/question/1?routing=1 -d '{"title": "..", "content": "..", "join_type": { "name": "question" }}'

// Writing CHILD.
// Passing parent "type" when writing child
// Also this would overwrite the parent actually
curl -XPUT localhost:9200/so_index/question/1?routing=1 -d '{"content": "..", "join_type": { "name": "answer", "parent": 1 }}'

In such a case, is there any provision or setting to prevent this from happening? Or is it upto the client to ensure sending non-conflicting IDs in such cases?

plexaikm · October 29, 2017, 11:12am

Thank you,
We will remove parent child relations (which we currently use as replacement of joins) and will do the joins in application

system · November 26, 2017, 11:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple types using new join datatype Elasticsearch	1	480	September 6, 2018
Parent and child fields in single document Elasticsearch	8	1671	March 25, 2020
Advice on Using Joins Elasticsearch	3	392	July 6, 2020
How to use parent child relationships between types residing in different index in ElasticSearch 6.5 Elasticsearch	3	1477	February 16, 2019
Parent-Child relationship using join Elasticsearch	1	587	December 12, 2017

Elasticsearch 6.x Join behaviour

Related topics