Get Aggregate results grouped by nested field value(s)

Vignesh_Theyagarajan · June 7, 2022, 10:47am

Sample input json:

Email1 - Paragraph1:

{
    “ParagraphId”: “Para1",
    “EmailId”: “Email1",
    “Content”: “The city is Chennai”,
    “Tags”: [
        {
            “TagId”: “Tag1”,
            “Name: “City”,
            “Values”: [
                {
                    “Value”: “Chennai”
                }
            ]
        }
    ]
}

Email1 - Paragraph2:

{
    “ParagraphId”: “Para2",
    “EmailId”: “Email1",
    “Content”: “It has around 10000 people living in the city.“,
    “Tags”: [
        {
            “TagId”: “Tag2”,
            “Name: “Population”,
            “Values”: [
                {
                    “Value”: “10000"
                }
            ]
        }
    ]
}

Email2 - Paragraph1:

{
    “ParagraphId”: “Para1",
    “EmailId”: “Email2",
    “Content”: “The city is Bengaluru”,
    “Tags”: [
        {
            “TagId”: “Tag1”,
            “Name: “City”,
            “Values”: [
                {
                    “Value”: “Bengaluru”
                }
            ]
        }
    ]
}

Email2 - Paragraph2:

{
    “ParagraphId”: “Para2",
    “EmailId”: “Email2",
    “Content”: “The city’s population is about 5000.“,
    “Tags”: [
        {
            “TagId”: “Tag2”,
            “Name: “Population”,
            “Values”: [
                {
                    “Value”: “5000"
                }
            ]
        }
    ]
}

Problem statement:

Email - can have multiple paragraphs

Paragraph - can have multiple Tags

Tag - can have multiple Values

Query:
Get the total population of all the cities grouped by city name

Current solution:
We’re using aggegations in Elasticsearch (2 queries) to solve this.
First we’re identifying the Paragraphs in the Emails which has the Tag name “City” linked to it.
In this case, the 1st Paragrah from both the Emails will be filtered and the list of EmailIds under each City bucket.
Then based on the EmailIds retrieved under each bucket, we’ll check for the Paragraphs that has the “Population” Tag linked to it. For those Population paragraphs, the SUM aggregate function will be applied.

Complexity:
The time taken for the queries here depends on the number of Cities present in the Paragraphs and the number of aggregate functions performed.
It takes more time for the query to execute with increasing bucket size.

Questions:

Is there any way, we can optimise the querying part?
Is there any schema change needed in the way we’re saving the data here?
If no possible way to achieve in ES, any other database alternatives?
It is more challenging to construct a query with multiple group by fields (Country and City tags). Any easy way to handle this?
If we’re returning the list of values under each bucket (instead of SUM aggregation here), ES seems to have a limit of 10000 values per bucket too. Any solution here too?

system · July 5, 2022, 10:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Nested Aggregation Elasticsearch	1	392	September 3, 2018
Finding Sum Aggregation for the properties from a Nested Document Elasticsearch	3	2193	August 5, 2019
Aggregate across the document from Nested Documents Elasticsearch	1	316	March 26, 2020
Query with aggregation and return single field in each bucket Elasticsearch	1	409	March 12, 2018
Aggregate elasticsearch index by nested filed values Elasticsearch	1	380	January 13, 2017

Get Aggregate results grouped by nested field value(s)

Related topics