Aggregating object keys to achieve a sum

I am trying to make an aggregation in which I can sum a number from a specific key of an object. As an example:

curl -XPUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "opinions": {
      "properties": {
        "topic": { "type": "keyword" },
        "count": { "type": "integer" }
      }
    }
  }
}
'
curl -X POST "localhost:9200/products/doc/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"opinions": [{"topic": "room", "count":2},{"topic":"kitchen","count":1}]}
{"index":{"_id":2}}
{"opinions":[{"topic": "room", "count":1},{"topic":"restroom","count":1}]}
'

I wanted to have a count of
room: 3
kitchen: 1
restroom: 1

I am looking for an aggregation in which I can subaggregate summing the count field. I tried the example below, but I am summing the count field without the topic scope:

curl -X POST "localhost:9200/products/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "opinion": {
      "terms": { "field": "opinions.topic" },
      "aggs": {
        "total": {
          "sum": { "field": "opinions.count" }
        }
      }
    }
  }
}'

If it makes things easier, I could also have an object like:

{ "room": 2, "restroom": 1 }

but I have 50 keys like that and I think the mapping is not that flexible.

I am using this object as a way to speedup the process, but I am flexible to change this structure.

I think you should reorganize your document structure. It's not intuitive the way you have it.
What you might want is to flatten out the document to be something like this:
{
"topic":"kitchen",
"count":1,
"xyz": "John" // a field with uniq id to tie multiple documents together
}

In your above example first document ("_id":1), you would have 2 documents
{ "topic":"room","count":2, "xyz":"customer1" },
{ "topic":"kitchen","count":1, "xyz":"customer1" }

for document ("_id":2), you would also have 2 documents
{ "topic":"room","count":1, "xyz":"customer2" },
{ "topic":"restroom","count":1, "xyz":"customer2" }

This way, you can eliminate the array list. By searching for the new field "xyz" you get the array list equivalent.

The benefit of flattening your document structure is to make aggregation a lot easier.

Hi @linkerc, thank your for your answer.

Actually, my document is now something like

{ "id":123, "topics": ["room", "kitchen", "room"], ... }

But I am not interested in the customer info when I am aggregating. I only want to know the count of the keyword repetitions inside the aggregation. So, if I have 1000 documents in one aggregation, I would count the number of times "room" appeared, also "kitchen" and 50 other keys.
I have a possible solution here but it's taking a looong time and I am trying to achieve a faster aggregation.

I think your "possible solution" is better. This kind of aggregation should be very fast.
How long is long for you?
Another possible speed up is index sorting. It sorts the documents based on the fields so it could skip files during search/aggregation.

{
  "properties": {
    "opinions": {
      "properties": {
        "topic": { "type": "keyword" },
        "count": { "type": "integer" }
      }
    }
  }
}

{"opinions": [{"topic": "room", "count":2},{"topic":"kitchen","count":1}]}a

Let me point out one problem of this mapping. You have to use nested fields for "opinions" to keep topic and count linked. Arrays of object is flattened internally.

And also aggregation query should be changed accordingly.

One alternative way is use the first aggregation of the previous topic and use transform to do the aggregation backgroud periodicaly.

Yep, I understood the solution would be through the nested field, which was not clear in the documentation.. Thanks!

This solution works for me.

curl -XDELETE "localhost:9200/products"
curl -XPUT "localhost:9200/products"
curl -XPUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "opinions": {
      "type": "nested",
      "properties": {
        "topic": {"type": "keyword"},
        "count": {"type": "long"}
      },
      "include_in_parent": true
    }
  }
}'

curl -X POST "localhost:9200/products/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"opinions":[{"topic": "room", "count": 2}, {"topic": "kitchen", "count": 1}]}
{"index":{"_id":2}}
{"opinions":[{"topic": "room", "count": 1}, {"topic": "restroom", "count": 1}]}
'

sleep 2
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "opinions": {
      "nested": {"path": "opinions"},
      "aggs": {
        "per_topic": {
          "terms": {"field": "opinions.topic"},
          "aggs": {
            "counts": {
              "sum": {"field": "opinions.count"}
            }
          }
        }
      }
    }
  }
}
'
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.