Aggregate counting/sum query

ninesalt · January 20, 2019, 5:46pm

I'm trying to figure out how to sum up many different counts in an elasticsearch index. A document in the index looks like this:

{
  '_source': {
    'my_field': 'Robert and Alex went with Robert to play in London and then Robert went to London',
    'ner': {
      'persons': {
        'Alex': 1,
        'Robert': 3
      },
      'organizations': {},
      'dates': {},
      'locations': {
        'London': 2
      }
    }
  }
}

How can I sum up all the different words in location, dates and persons in the index? For example if another document had 2 occurrences of Alex, I'd get Alex: 3, Robert: 3, ..

abdon · January 22, 2019, 1:48pm

If you have a limited number of persons/organizations/dates/locations, and you know exactly what the values of those entities are going to be, then you can use a sum aggregation to sum the counts for each value individually:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "sum_alexes": {
      "sum": {
        "field": "ner.persons.Alex"
      }
    },
    "sum_roberts": {
      "sum": {
        "field": "ner.persons.Robert"
      }
    }
  }
}

It's probably not the case that you know all the values beforehand though? If you're dealing with a potentially large number of different values, and you do not know the values beforehand, then it's going to be a bit more work. I would suggest you restructure your documents using nested types.

For example, just taking persons into account, you could create your index with a mapping like this:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "ner": {
          "properties": {
            "persons": {
              "type": "nested",
              "properties": {
                "name": {
                  "type": "keyword"
                },
                "count": {
                  "type": "integer"
                }
              }
            }
          }
        }
      }
    }
  }
}

Now you will need to index your data providing an array of objects that each contain the "name" and "count" fields:

PUT my_index/_doc/1
{
  "my_field": "Robert and Alex went with Robert to play in London and then Robert went to London",
  "ner": {
    "persons": [
      {
        "name": "Alex",
        "count": 1
      },
      {
        "name": "Robert",
        "count": 3
      }
    ]
  }
}

PUT my_index/_doc/2
{
  "my_field": "Alex foo bar Alex",
  "ner": {
    "persons": [
      {
        "name": "Alex",
        "count": 2
      }
    ]
  }
}

You can now use a nested aggregation in combination with a terms aggregation and a sum aggregation to get a list of the for example 10 names with the highest total count:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "persons": {
      "nested": {
        "path": "ner.persons"
      },
      "aggs": {
        "top_names": {
          "terms": {
            "field": "ner.persons.name",
            "size": 10,
            "order": {
              "total_sum_count": "desc"
            }
          },
          "aggs": {
            "total_sum_count": {
              "sum": {
                "field": "ner.persons.count"
              }
            }
          }
        }
      }
    }
  }
}

ninesalt · January 22, 2019, 3:47pm

This is great. Thank you. However I chose to go with a different approach for this. I noticed that in the Discover tab in Kibana, it shows you the most used words in a field so I changed the structure to be like this:

{
 persons: ['Alex', 'Alex', 'Robert']
}

I realize this is less efficient than you're suggestion but your suggestion is not working too well with Kibana.

abdon · January 22, 2019, 8:47pm

Yeah, you're right - working with nested types is not really supported by Kibana.

I don't know what you're actually doing in Kibana, but there is one thing to be aware of with your approach. Even though you have added the word Alex twice to that persons field, if you run a terms aggregation on that field, the term Alex will only be counted once. The terms aggregation counts documents that contain the term - not individual occurrences of the term.

ninesalt · February 3, 2019, 4:33pm

Sorry if this is a little unrelated to my original question, but I'm trying to figure out how to aggregate documents that follow a structure similar to my last comment. I'm trying to count the individual occurrences of the words (in other words, I'm trying to do what you stated in your last sentence), similar to what Kibana does automatically:

Also, I don't understand why Kibana doesn't show a visualize button for these fields.

system · March 3, 2019, 4:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to query an elasticsearch aggregation with a term and sum on different nested objects? Elasticsearch	1	787	July 5, 2017
Sum aggregation on "counter like" object field Elasticsearch	4	403	April 27, 2020
Need help with Sum Elasticsearch	4	303	July 6, 2017
Use nested doc_count in terms aggregation Elasticsearch	2	4999	April 24, 2017
Group Values with Aggregation Elasticsearch	8	1247	October 21, 2019

Aggregate counting/sum query

Related Topics