All,
I need to aggregations at deep level viz. Nesting more than 6 terms (non_analyzed) with some search on analyzed content. Trying to answer the below query as recommendation is that avoid deep aggregation. I saw a comment that and issue around deep aggregation is fixed in 1.2. We are using 2.x version now.
Queries
- How deep I can go viz. Levels? I don't think there is a restriction, but not sure!
- Best practices around deep aggregations with NESTED content type:nested in mapping. Below are some we are following already.
--- Aggregate at parent level, followed by nested
--- Avoid reverse_nesting unless it is absolutely needed
--- Avoid nesting when we could aggregate without nesting viz. Aggregate on only one nested element and rest are parent. In this case include_in_parent on nested helps not to nest during queries as relation within nested object attribute are not needed to look at
--- Breadth_first on high-cardinality columns with additional filter
Data size : 20 TB (representing 1 yr) and increasing on rolling 1 year
Node 30 Data nodes and 4 client nodes
Some observations
- High JVM usage and CPU usage on Aggregations.
The above usage is required due to the fact that Pivot Table is a major request from business on search+aggregations.
We are also looking at alternatives like Spark etc. to see if we can get aggregations. 90% of aggregations queries hit all nodes (for a replica set) as the quey spans across year.
Thanks for inputs