One line which i quite puzzled about is (yes its the first line...): The join datatype is a special field that creates parent/child relation within documents of the same index.
Now this is not a replacement for the current parent child relation which support different types, and because different types can no longer be held on same index, i cant use parent child between different types anymore?
Hello,
Elasticsearch 6.x will only allow you to have a single document type per index.
The new join datatype will allow you to still model parent-child relationships in your data with a single document type.
What does this mean for you?
You will have to migrate to a single type, but no functionality will be lost with the new join functionality. You will still be able to do all queries that you can do right now.
Hello,
Yes, i understand the suggested approach (single index, single type, merge fields from types we currently have parent-child relations)
It will allow us to create fields of all "logical" types even if we use strict mapping
Is there a future planning for join between documents on different indices? (yes i'm aware that current parent-child works due to co-location of the document on the same shard via routing and joining between different indices is significantly harder in distributed parallel system )
In the past there was a plugin (siren join) providing this functionality, currently unsupported (developers concentrated on wider commercial solution)
Query-time joins across networks will always be expensive and so we don't support it.
If I recall correctly their approach to scaling joins with large numbers of IDs was to save space by sending hashes rather than full ID strings and join only using those. In Java this would be the equivalent of a HashMap based on objects that implemented hashCode() but not equals() - it would be fast but (scarily) you have the potential for false positives.
Sending a lot of data over a network is slow and physics is a tough thing to beat.
So with the new implementation, not only I'm forced to map the "child type" to a parent (which will be removed in ES 7 of course), I've to also resolve _id 1 of parent to not conflict with _id 1 of child. So when writing a child, like shown below, it would actually overwrite the parent if the _id is same.
// Writing PARENT
curl -XPUT localhost:9200/so_index/question/1?routing=1 -d '{"title": "..", "content": "..", "join_type": { "name": "question" }}'
// Writing CHILD.
// Passing parent "type" when writing child
// Also this would overwrite the parent actually
curl -XPUT localhost:9200/so_index/question/1?routing=1 -d '{"content": "..", "join_type": { "name": "answer", "parent": 1 }}'
In such a case, is there any provision or setting to prevent this from happening? Or is it upto the client to ensure sending non-conflicting IDs in such cases?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.