NestedType - DISTINCT COUNT - Flattened Data - Performance

Karthik_Ramachandran · March 3, 2016, 8:15pm

We have mapping where lot of nested type is already present. I thought of removing nested by doing Actual*Nested viz. Flattening. But it explodes data like anything. Moreover, to define the correct aggregation metric on search, we need to provide distinct count on a specific attribute key. I have concerns on this. It would be helpful if someone could clarify.

Case below:

{
myJSONKey:
Attr1:
Attr2: <>
....
,NestedType1 :{
[Array of objects having Attr1, Attr2...]
}
,NestedType2 :{
[Array of objects having Attr1, Attr2...]
}
}

Convert to
{
myJSONKey:
Attr1:
Attr2: <>
NestedType1.Attr1: NOTE: It is not array of attr1 values. Each value will ahve its own index explicitly specified
NestedType1.Attr2:
NestedType2.Attr1:
NestedType2.Attr2:
}

I could also think of moving Attr2 to a separate Index for Full Text search thus reduce index size. But If I have to combine query and filter, it would be an issue. All my needs are Query+Filter and create aggregations over a year etc.

My Anticipated Volume is 7300 Million entries per year when I have nested types. If I flatten it, it may explode to 7300 Million * 15 times -and will end up doing DISTINCT count to get accurate document count.

Any thoughts on this would be helpful

Topic		Replies	Views
Aggregating by a given item in array Elasticsearch	5	501	October 16, 2019
Find the distinct values under the nested type Elasticsearch	1	410	December 10, 2018
How to find the distinct values in a nested type field in elasticsearch? Can anyone provide one example? Elasticsearch	2	1964	December 11, 2018
Aggregations distinct doc_count Elasticsearch	1	397	May 22, 2019
Nested object docs.count anomaly Elasticsearch	2	356	July 6, 2017

NestedType - DISTINCT COUNT - Flattened Data - Performance

Related topics