Traditional Group By With Aggregations & Pagination

Hi all,

We are looking to rewrite our current Elastic search implementation as it's not fit for purpose.

We have a number of hotels which we index in as a single document with all of the packages available for that hotel nested inside that document as a collection. Packages have categories, prices and price dates nested inside. When we perform aggregations on our documents, if we perform an aggregation on something like category, the counts reflect the count of child documents (as they are nested under a hotel) rather than just counting the parent document once.

For example:

Hotel test has 8 packages available nested beneath it. Those packages have the categories:

Beach, Forest, Beach, Beach, Forest, Resort, Resort, Beach

But if we aggregate based on this we get:

Beach = 4
Forest = 2
Resort = 2

Which obviously is correct in terms of the sub document but I want the aggregation to reflect the route document so instead of the above our aggregation counts would be as follows:

Beach = 1
Forest = 1
Resort = 1

As it's only counting each category once.

I know when the document is returned it will still have the child collection beneath but we can filter that down based on our facets using .net after the fact. Our main focus is maintaining the correct counts as we've found we could be returning 200 hotels to a user but the counts for aggregations range in the 1000s which doesn't make sense.

Obviously above I've only used one facet as an example but there will be multiple facets and so I want it so that if I define a category and a location in my search then the facets should only count each parent document once (or wrapper document) so that the facets are root document based rather than taking into account the number of children in the collection beneath.

We did potentially look at flattening the collections within each hotel so that instead of 1 document per hotel we would end up with 8 documents. One for each package and hotel combination however, in our UI this would results in searches returning multiple versions of 1 hotel when ideally we only want to display one hotel record on the front end with all of the relevant packages that fit the search grouped into that hotel on the search page.

Does anyone have any tips on how we can achieve this as at the moment I understand that group by is known as aggregations in Elasticsearch but we are using them for facet / filter counts and I don't see how this would work if we were to use an aggregation to group packages by hotel as well.

Happy to provide more information as I know this is a tricky one but is seems like a huge limitation of Elastic search at the moment.

For example, using the new .net client I am doing the following:

Dictionary<string, Aggregation> lowerLevelAggregations = new Dictionary<string, Aggregation>();
        lowerLevelAggregations.Add("rounds", Aggregation.Terms(new TermsAggregation()
            {
                Field = "products.rounds",
                Size = 500
            })
        );

var topLevelAggregation = Aggregation.Nested(new NestedAggregation()
        {
            Path = "products"
        });

topLevelAggregation.Aggregations = lowerLevelAggregations;
        
  var searchRequest = new SearchRequest("poc-test-venues-index-gbp")
  {
      From = 0,
      Size = 10,
      Aggregations = new Dictionary<string, Aggregation>()
      {
          { "top-level", topLevelAggregation }
      }
  };

This produces the aggregations based on the nested documents but I want them based on the root document.

ChatGpt isn't very helpful and keeps suggesting NEST based implementations which is being retired. After some prompting it came up with this but my IDE flags this as invalid code:

lowerLevelAggregations.Add("rounds", Aggregation.Terms(new TermsAggregation()
    {
        Field = "products.rounds",
        Size = 500
    })
    .Aggregations(new Dictionary<string, Aggregation>
    {
        { 
            "parent_doc_count", Aggregation.ReverseNested(new ReverseNestedAggregation()
                .Aggregations(new Dictionary<string, Aggregation>
                {
                    {
                        "unique_doc_count", Aggregation.Cardinality(new CardinalityAggregation
                        {
                            Field = "parent_doc_id" // Replace this with your unique parent field
                        })
                    }
                }))
        }
    })
);

I've tried to consult the elastic documentation for the new client but it's very lacking.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Have a look at the Elastic Stack and Solutions Help · Forums and Slack | Elastic page. It contains also lot of useful information on how to ask for help.

I'm not using the kibana dev console, I am using the new .net client which is not documented very well so providing Cabana queries would not be reflective of what I am trying to achieve. I've updated the original post with my .net implementation