Aggregate counting/sum query

If you have a limited number of persons/organizations/dates/locations, and you know exactly what the values of those entities are going to be, then you can use a sum aggregation to sum the counts for each value individually:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "sum_alexes": {
      "sum": {
        "field": "ner.persons.Alex"
      }
    },
    "sum_roberts": {
      "sum": {
        "field": "ner.persons.Robert"
      }
    }
  }
}

It's probably not the case that you know all the values beforehand though? If you're dealing with a potentially large number of different values, and you do not know the values beforehand, then it's going to be a bit more work. I would suggest you restructure your documents using nested types.

For example, just taking persons into account, you could create your index with a mapping like this:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "ner": {
          "properties": {
            "persons": {
              "type": "nested",
              "properties": {
                "name": {
                  "type": "keyword"
                },
                "count": {
                  "type": "integer"
                }
              }
            }
          }
        }
      }
    }
  }
}

Now you will need to index your data providing an array of objects that each contain the "name" and "count" fields:

PUT my_index/_doc/1
{
  "my_field": "Robert and Alex went with Robert to play in London and then Robert went to London",
  "ner": {
    "persons": [
      {
        "name": "Alex",
        "count": 1
      },
      {
        "name": "Robert",
        "count": 3
      }
    ]
  }
}

PUT my_index/_doc/2
{
  "my_field": "Alex foo bar Alex",
  "ner": {
    "persons": [
      {
        "name": "Alex",
        "count": 2
      }
    ]
  }
}

You can now use a nested aggregation in combination with a terms aggregation and a sum aggregation to get a list of the for example 10 names with the highest total count:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "persons": {
      "nested": {
        "path": "ner.persons"
      },
      "aggs": {
        "top_names": {
          "terms": {
            "field": "ner.persons.name",
            "size": 10,
            "order": {
              "total_sum_count": "desc"
            }
          },
          "aggs": {
            "total_sum_count": {
              "sum": {
                "field": "ner.persons.count"
              }
            }
          }
        }
      }
    }
  }
}
2 Likes