NestedType - DISTINCT COUNT - Flattened Data - Performance


(Karthik Ramachandran) #1

We have mapping where lot of nested type is already present. I thought of removing nested by doing Actual*Nested viz. Flattening. But it explodes data like anything. Moreover, to define the correct aggregation metric on search, we need to provide distinct count on a specific attribute key. I have concerns on this. It would be helpful if someone could clarify.

Case below:

{
myJSONKey:
Attr1:
Attr2: <>
....
,NestedType1 :{
[Array of objects having Attr1, Attr2...]
}
,NestedType2 :{
[Array of objects having Attr1, Attr2...]
}
}

Convert to
{
myJSONKey:
Attr1:
Attr2: <>
NestedType1.Attr1: NOTE: It is not array of attr1 values. Each value will ahve its own index explicitly specified
NestedType1.Attr2:
NestedType2.Attr1:
NestedType2.Attr2:
}

I could also think of moving Attr2 to a separate Index for Full Text search thus reduce index size. But If I have to combine query and filter, it would be an issue. All my needs are Query+Filter and create aggregations over a year etc.

My Anticipated Volume is 7300 Million entries per year when I have nested types. If I flatten it, it may explode to 7300 Million * 15 times -and will end up doing DISTINCT count to get accurate document count.

Any thoughts on this would be helpful


(system) #2