Include_in_root deprecated

panda2004 · July 1, 2017, 10:14am

Hello,

I understand that include_in_root / include_in_parent is now deprecated for the use of copy_to.
How can I use copy_to for this purpose?

This is my guess:

PUT orders
{
	"mappings" : {
		"Order" : {
			"properties" : {
				"BuyersNames" : {
					"type" : "keyword"
				},
				"Buyers" : {
					"type" : "nested",
					"properties" : {
						"Name" : {
							"type" : "keyword",
							// DEPRECATED: "include_in_root": "true",
							"copy_to" : "BuyersName" 
						}
					}
				}
			}
		}
	}
}

Will this allow making something like this?

Request: GET orders/order/1

Response :

{
	"BuyersNames" : ["John", "George"],
	"Buyers" : [{
			"Name" : "John"
		}, {
			"Name" : "George"
		}
	]
}

By the way, is it really helpful in matter of search performance? Meaning that making a term query (BuyersNames include John) is much faster than making a nested query (BuyersNames.Name = John)?

jpountz · July 3, 2017, 1:29pm

It is not deprecated yet, we are only discussing it for now.

Conceptually, yes, your document would be indexed as if it had "BuyersNames" : ["John", "George"]. However, note that the _source will not reflect this.

By the way, is it really helpful in matter of search performance? Meaning that making a term query (BuyersNames include John) is much faster than making a nested query (BuyersNames.Name = John)?

Yes, this will be much faster. I can't quantify, but probably noticeable.

panda2004 · July 4, 2017, 7:41am

I have tried this mapping today, and it worked just as expected.
Thank you for the remark about the _source, it was very confusing at first!

One last discussion: how smart is this copying?

If I have 2 buyers with the name "John", then BuyersNames array will include "John" twice? I believe it will and it's fine This way I can count the number of buyers by the length of BuyersNames instead of following the nested documents.
If a buyer is deleted, then his/her name will be removed from BuyersNames? I believe it won't - and it's fine. After all it is called "copy" not "binding", and in addition it's not common to delete a nested document nor updating it.
Does include_in_* behave the same - concerning questions 1+2?

jpountz · July 4, 2017, 8:06am

If I have 2 buyers with the name "John", then BuyersNames array will include "John" twice? I believe it will and it's fine This way I can count the number of buyers by the length of BuyersNames instead of following the nested documents.

Actually keyword fields are aware of duplicates for scoring, but not for aggregations. So if you count the number of buyers, all johns will count as 1.

If a buyer is deleted, then his/her name will be removed from BuyersNames? I believe it won't - and it's fine. After all it is called "copy" not "binding", and in addition it's not common to delete a nested document nor updating it.

It will be removed. Elasticsearch does not perform in-place updates. If you update a document, we actually compute the new updated document and reindex it entirely.

Does include_in_* behave the same - concerning questions 1+2?

Yes.

panda2004 · July 4, 2017, 8:21am

Thanks for the quick reply!

Actually keyword fields are aware of duplicates for scoring, but not for aggregations. So if you count the number of buyers, all johns will count as 1.

Didn't know that, it's very interesting!
Lets say I've got 2 documents, each with 3 Johns. When a terms aggregation will be made about all the documents who's got "john", the result will be "john": 6, or "john": 2?

jpountz · July 4, 2017, 9:19am

john:2

panda2004 · July 15, 2017, 7:48pm

Hey again @jpountz. Sorry for bumping again the thread, but I want to ask again about your last insight concerning "exactly once" in Keyword data type. Since Array data type and Object data type act like their own inner data types (Keyword in my use case), it means that they can't assist here. Does it mean that only Nested data type containing a Keyword field will do the job, in case I need to count duplicate values?

By the way, I think it should really be documented in Keyword data type.

jpountz · July 17, 2017, 6:57am

I think you are right that only nested could do the job in that case indeed. For the record, this is not specific to keywords, but to aggregations: aggregations count documents, not values. So if a given document has twice the same value, it still only counts 1 since there is only 1 document. I think this is expected since the name of the field in the response is called doc_count?

panda2004 · July 17, 2017, 10:25pm

Actually what you have said totally makes sense. Haven't noticed until now that some aggregations return "doc_count" while others return "value". Thanks again!

system · August 14, 2017, 10:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using copy_to in the dynamic templates for nested documents Elasticsearch	1	650	August 1, 2019
Include-in-all and include-in-root Elasticsearch	5	2672	July 6, 2017
Nested object - include_in_parent vs include_in_root Elasticsearch	1	1143	July 6, 2017
Include_in_parent Question Elasticsearch	1	923	May 22, 2017
Bug?: nested object mapping won't include both include_in_parent and include_in_root Elasticsearch	1	368	September 16, 2020

Include_in_root deprecated

Related topics