If you have a limited number of persons/organizations/dates/locations, and you know exactly what the values of those entities are going to be, then you can use a sum
aggregation to sum the counts for each value individually:
GET my_index/_search
{
"size": 0,
"aggs": {
"sum_alexes": {
"sum": {
"field": "ner.persons.Alex"
}
},
"sum_roberts": {
"sum": {
"field": "ner.persons.Robert"
}
}
}
}
It's probably not the case that you know all the values beforehand though? If you're dealing with a potentially large number of different values, and you do not know the values beforehand, then it's going to be a bit more work. I would suggest you restructure your documents using nested types.
For example, just taking person
s into account, you could create your index with a mapping like this:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"ner": {
"properties": {
"persons": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"count": {
"type": "integer"
}
}
}
}
}
}
}
}
}
Now you will need to index your data providing an array of objects that each contain the "name" and "count" fields:
PUT my_index/_doc/1
{
"my_field": "Robert and Alex went with Robert to play in London and then Robert went to London",
"ner": {
"persons": [
{
"name": "Alex",
"count": 1
},
{
"name": "Robert",
"count": 3
}
]
}
}
PUT my_index/_doc/2
{
"my_field": "Alex foo bar Alex",
"ner": {
"persons": [
{
"name": "Alex",
"count": 2
}
]
}
}
You can now use a nested aggregation in combination with a terms aggregation and a sum aggregation to get a list of the for example 10 names with the highest total count:
GET my_index/_search
{
"size": 0,
"aggs": {
"persons": {
"nested": {
"path": "ner.persons"
},
"aggs": {
"top_names": {
"terms": {
"field": "ner.persons.name",
"size": 10,
"order": {
"total_sum_count": "desc"
}
},
"aggs": {
"total_sum_count": {
"sum": {
"field": "ner.persons.count"
}
}
}
}
}
}
}
}