Aggregating object keys to achieve a sum

hugomeloo · February 3, 2022, 1:40pm

I am trying to make an aggregation in which I can sum a number from a specific key of an object. As an example:

curl -XPUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "opinions": {
      "properties": {
        "topic": { "type": "keyword" },
        "count": { "type": "integer" }
      }
    }
  }
}
'
curl -X POST "localhost:9200/products/doc/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"opinions": [{"topic": "room", "count":2},{"topic":"kitchen","count":1}]}
{"index":{"_id":2}}
{"opinions":[{"topic": "room", "count":1},{"topic":"restroom","count":1}]}
'

I wanted to have a count of
room: 3
kitchen: 1
restroom: 1

I am looking for an aggregation in which I can subaggregate summing the count field. I tried the example below, but I am summing the count field without the topic scope:

curl -X POST "localhost:9200/products/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "opinion": {
      "terms": { "field": "opinions.topic" },
      "aggs": {
        "total": {
          "sum": { "field": "opinions.count" }
        }
      }
    }
  }
}'

If it makes things easier, I could also have an object like:

{ "room": 2, "restroom": 1 }

but I have 50 keys like that and I think the mapping is not that flexible.

I am using this object as a way to speedup the process, but I am flexible to change this structure.

linkerc · February 3, 2022, 10:23pm

I think you should reorganize your document structure. It's not intuitive the way you have it.
What you might want is to flatten out the document to be something like this:
{
"topic":"kitchen",
"count":1,
"xyz": "John" // a field with uniq id to tie multiple documents together
}

In your above example first document ("_id":1), you would have 2 documents
{ "topic":"room","count":2, "xyz":"customer1" },
{ "topic":"kitchen","count":1, "xyz":"customer1" }

for document ("_id":2), you would also have 2 documents
{ "topic":"room","count":1, "xyz":"customer2" },
{ "topic":"restroom","count":1, "xyz":"customer2" }

This way, you can eliminate the array list. By searching for the new field "xyz" you get the array list equivalent.

The benefit of flattening your document structure is to make aggregation a lot easier.

hugomeloo · February 3, 2022, 10:53pm

Hi @linkerc, thank your for your answer.

Actually, my document is now something like

{ "id":123, "topics": ["room", "kitchen", "room"], ... }

But I am not interested in the customer info when I am aggregating. I only want to know the count of the keyword repetitions inside the aggregation. So, if I have 1000 documents in one aggregation, I would count the number of times "room" appeared, also "kitchen" and 50 other keys.
I have a possible solution here but it's taking a looong time and I am trying to achieve a faster aggregation.

linkerc · February 3, 2022, 11:27pm

I think your "possible solution" is better. This kind of aggregation should be very fast.
How long is long for you?
Another possible speed up is index sorting. It sorts the documents based on the fields so it could skip files during search/aggregation.

Tomo_M · February 4, 2022, 4:55pm

{
  "properties": {
    "opinions": {
      "properties": {
        "topic": { "type": "keyword" },
        "count": { "type": "integer" }
      }
    }
  }
}

{"opinions": [{"topic": "room", "count":2},{"topic":"kitchen","count":1}]}a

Let me point out one problem of this mapping. You have to use nested fields for "opinions" to keep topic and count linked. Arrays of object is flattened internally.

And also aggregation query should be changed accordingly.

One alternative way is use the first aggregation of the previous topic and use transform to do the aggregation backgroud periodicaly.

hugomeloo · February 8, 2022, 1:10pm

Yep, I understood the solution would be through the nested field, which was not clear in the documentation.. Thanks!

This solution works for me.

curl -XDELETE "localhost:9200/products"
curl -XPUT "localhost:9200/products"
curl -XPUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "opinions": {
      "type": "nested",
      "properties": {
        "topic": {"type": "keyword"},
        "count": {"type": "long"}
      },
      "include_in_parent": true
    }
  }
}'

curl -X POST "localhost:9200/products/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"opinions":[{"topic": "room", "count": 2}, {"topic": "kitchen", "count": 1}]}
{"index":{"_id":2}}
{"opinions":[{"topic": "room", "count": 1}, {"topic": "restroom", "count": 1}]}
'

sleep 2
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "opinions": {
      "nested": {"path": "opinions"},
      "aggs": {
        "per_topic": {
          "terms": {"field": "opinions.topic"},
          "aggs": {
            "counts": {
              "sum": {"field": "opinions.count"}
            }
          }
        }
      }
    }
  }
}
'

system · March 8, 2022, 1:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Counting each key of an array inside aggregation Elasticsearch	5	2902	March 3, 2022
Aggregating on an Array of Objects Elasticsearch	3	5871	July 5, 2017
Add values of one key of all objects only where other keys have the required values and group by another value Elasticsearch	1	408	July 29, 2019
Aggregation on fields within nested objects in elasticsearch Elasticsearch	6	6126	July 11, 2018
Aggregating array of objects Elasticsearch	3	2103	January 11, 2019

Aggregating object keys to achieve a sum

Related topics