Group by id base of sum of range of value

I had some data as below:

[
  {
    "PAY_DATE": "2019-10-24",
    "STATE": "Utah",
    "id": "1",
    "SALARY": 6045,
    "UUID": "a879492b-b402-40bd-8f5d-afc34d66d152"
  },
  {
    "PAY_DATE": "2021-01-17",
    "STATE": "Illinois",
    "id": "2",
    "SALARY": 7316,
    "UUID": "76492fec-bbe1-443c-8ab5-6b150c48b83e"
  },
  {
    "PAY_DATE": "2020-08-24",
    "STATE": "Indiana",
    "id": "3",
    "SALARY": 2191,
    "UUID": "70877253-b251-4247-9adb-2af1838388a4"
  },
  {
    "PAY_DATE": "2004-03-07",
    "STATE": "New Mexico",
    "id": "4",
    "SALARY": 6289,
    "UUID": "fefdce4b-2da6-498d-ab11-6366f6742105"
  },
  {
    "PAY_DATE": "2021-01-17",
    "STATE": "New York",
    "id": "2",
    "SALARY": 1736,
    "UUID": "aeb8191a-aee1-43e6-bfa2-5cad060d0cf5"
  }
]

And i want the following result:

For 0-3000
  [
    {
      "id":3
    }
  ]


For 3000-8000
  [
    {
      "id":1
    },
    {
      "id":4
    }
  ]

For 8000+
  [
    {
      "id":2
    }
  ]
// due to addition of (7316 + 1736)

if we use terms aggregation then this is feasible. But for large data set it is taking time.

Please suggest if this is feasible using ES query without terms aggregations.

Thank you inadvance.

Hi,

You can use Range aggregation and top-hits aggregation together.

I have added the provided data to an index, named dummy

GET dummy/_search

Result:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "dummy",
        "_id": "UG6704kBSGAOPcx9NeY3",
        "_score": 1,
        "_source": {
          "PAY_DATE": "2019-10-24",
          "STATE": "Utah",
          "id": "1",
          "SALARY": 6045,
          "UUID": "a879492b-b402-40bd-8f5d-afc34d66d152"
        }
      },
      {
        "_index": "dummy",
        "_id": "WG6704kBSGAOPcx9OOaq",
        "_score": 1,
        "_source": {
          "PAY_DATE": "2021-01-17",
          "STATE": "Illinois",
          "id": "2",
          "SALARY": 7316,
          "UUID": "76492fec-bbe1-443c-8ab5-6b150c48b83e"
        }
      },
      {
        "_index": "dummy",
        "_id": "WW6704kBSGAOPcx9P-bN",
        "_score": 1,
        "_source": {
          "PAY_DATE": "2020-08-24",
          "STATE": "Indiana",
          "id": "3",
          "SALARY": 2191,
          "UUID": "70877253-b251-4247-9adb-2af1838388a4"
        }
      },
      {
        "_index": "dummy",
        "_id": "Wm6704kBSGAOPcx9SeaC",
        "_score": 1,
        "_source": {
          "PAY_DATE": "2004-03-07",
          "STATE": "New Mexico",
          "id": "4",
          "SALARY": 6289,
          "UUID": "fefdce4b-2da6-498d-ab11-6366f6742105"
        }
      },
      {
        "_index": "dummy",
        "_id": "W26704kBSGAOPcx9TuYr",
        "_score": 1,
        "_source": {
          "PAY_DATE": "2021-01-17",
          "STATE": "New York",
          "id": "2",
          "SALARY": 1736,
          "UUID": "aeb8191a-aee1-43e6-bfa2-5cad060d0cf5"
        }
      }
    ]
  }
}

Use the following aggregations

GET dummy/_search
{
  "size": 0,
  "aggs": {
    "salary_ranges": {
      "range": {
        "field": "SALARY",
        "ranges": [
          {
            "from": 0,
            "to": 3000
          },
          {
            "from": 3000,
            "to": 8000
          },
          {
            "from": 8000
          }
        ]
      },
      "aggs": {
        "ids": {
          "top_hits": {
            "size": 100,
            "_source": false,
            "fields": ["id"]
          }
        }
      }
    }
  }
}

Result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "salary_ranges": {
      "buckets": [
        {
          "key": "0.0-3000.0",
          "from": 0,
          "to": 3000,
          "doc_count": 2,
          "ids": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1,
              "hits": [
                {
                  "_index": "dummy",
                  "_id": "WW6704kBSGAOPcx9P-bN",
                  "_score": 1,
                  "fields": {
                    "id": [
                      "3"
                    ]
                  }
                },
                {
                  "_index": "dummy",
                  "_id": "W26704kBSGAOPcx9TuYr",
                  "_score": 1,
                  "fields": {
                    "id": [
                      "2"
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "key": "3000.0-8000.0",
          "from": 3000,
          "to": 8000,
          "doc_count": 3,
          "ids": {
            "hits": {
              "total": {
                "value": 3,
                "relation": "eq"
              },
              "max_score": 1,
              "hits": [
                {
                  "_index": "dummy",
                  "_id": "UG6704kBSGAOPcx9NeY3",
                  "_score": 1,
                  "fields": {
                    "id": [
                      "1"
                    ]
                  }
                },
                {
                  "_index": "dummy",
                  "_id": "WG6704kBSGAOPcx9OOaq",
                  "_score": 1,
                  "fields": {
                    "id": [
                      "2"
                    ]
                  }
                },
                {
                  "_index": "dummy",
                  "_id": "Wm6704kBSGAOPcx9SeaC",
                  "_score": 1,
                  "fields": {
                    "id": [
                      "4"
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "key": "8000.0-*",
          "from": 8000,
          "doc_count": 0,
          "ids": {
            "hits": {
              "total": {
                "value": 0,
                "relation": "eq"
              },
              "max_score": null,
              "hits": []
            }
          }
        }
      ]
    }
  }
}

But I want id=2 to be in 8000+ range not in 3000 - 8000.

In the documents u gave, there are 2 docs with id=2, one with salary=7316 and other with salary=1736. So when grouping based on salary ranges, the first document with id=2&salary=7316 will go within 3000-8000 range and other document will to 0-3000 range, as expected.

Are u using some other logic? Do u want to group docs based on some other field?

Or is it like say for id=2, the sum of both will be salary=7316+1736=9052, which is greater than 8000 and hence should be in 8000+ range?

Yes I want to group docs by id and sum the salary and then splitting into the ranges.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.