Deep aggregations

All,

I need to aggregations at deep level viz. Nesting more than 6 terms (non_analyzed) with some search on analyzed content. Trying to answer the below query as recommendation is that avoid deep aggregation. I saw a comment that and issue around deep aggregation is fixed in 1.2. We are using 2.x version now.

Queries

  1. How deep I can go viz. Levels? I don't think there is a restriction, but not sure!
  2. Best practices around deep aggregations with NESTED content type:nested in mapping. Below are some we are following already.
    --- Aggregate at parent level, followed by nested
    --- Avoid reverse_nesting unless it is absolutely needed
    --- Avoid nesting when we could aggregate without nesting viz. Aggregate on only one nested element and rest are parent. In this case include_in_parent on nested helps not to nest during queries as relation within nested object attribute are not needed to look at
    --- Breadth_first on high-cardinality columns with additional filter

Data size : 20 TB (representing 1 yr) and increasing on rolling 1 year
Node 30 Data nodes and 4 client nodes

Some observations

  • High JVM usage and CPU usage on Aggregations.

The above usage is required due to the fact that Pivot Table is a major request from business on search+aggregations.

We are also looking at alternatives like Spark etc. to see if we can get aggregations. 90% of aggregations queries hit all nodes (for a replica set) as the quey spans across year.

Thanks for inputs