Sum distinct numeric values in object datatype

geppo · December 21, 2016, 6:12pm

Hello,
I'm now indexing some newspaper documents. I have created some object fields in my index that collect all the entities founded in the text field and its frequencies in the same text field, E.G.:

          "people": [
      {
         "count": 2,
         "value": "Ermanno"
      },
      {
         "count": 2,
         "value": "Anna Finocchiaro"
      },
      {
         "count": 2,
         "value": "Roberto Calderoli"
      },
      {
         "count": 2,
         "value": "Silvio Berlusconi"
      },
      {
         "count": 2,
         "value": "Denis Verdini"
      },
      {
         "count": 2,
         "value": "Paolo Romani"
      },
      {
         "count": 2,
         "value": "Juncker"
      },
      {
         "count": 2,
         "value": "Federica Mogherini"
      },
      {
         "count": 4,
         "value": "Angela Merkel"
      },
      {
         "count": 2,
         "value": "Matteo Renzi"
      },
      {
         "count": 2,
         "value": "Junker"
      },
      {
         "count": 2,
         "value": "Beppe Grillo"
      },
      {
         "count": 4,
         "value": "Giancarlo Galan"
      },
      {
         "count": 2,
         "value": "Myrta Merlino"
      },
      {
         "count": 2,
         "value": "Yara Gambirasio"
      },
      {
         "count": 2,
         "value": "Francesco Dettori"
      },
      {
         "count": 2,
         "value": "John Kerry"
      },
      {
         "count": 2,
         "value": "Obama"
      },
      {
         "count": 2,
         "value": "Putin"
      },
      {
         "count": 2,
         "value": "Kuchma"
      },
      {
         "count": 6,
         "value": "Prandelli"
      },
      {
         "count": 2,
         "value": "Cesare"
      },
      {
         "count": 2,
         "value": "Chiellini"
      },
      {
         "count": 2,
         "value": "Pirlo"
      },
      {
         "count": 2,
         "value": "Balotelli"
      }
   ]

the mapping settings are these ones:

          "people":{
              "properties": {
                          "count": {
                                      "type": "integer",
                                      "doc_values": true,
                                      "index": true
        },
                        "value":{  
                                     "type": "text",
                                     "analyzer": "namedentities_analyzer",
                                    "fielddata": true
        }
                }
          }

Now I would like to make a query to retrieve all the people entity found in all the newspapers of the last two months and order them by the sums of their frequencies count. Something like this:

value: Bergoglio, sum_of_count: 256, doc_count:200
value: Berlusconi, sum_of_count: 239, doc_count: 180,
etc....
How i can do that? I have to change my data structure?

xavierfacq · December 21, 2016, 9:58pm

Hi,

In order to use aggregations, I think you'll have to convert your field "value" to not_analyzed ?
After, you should try to make some test queries and play with Aggregations.

bye,
Xavier

geppo · December 22, 2016, 10:41am

HI, I have changed my index mapping according your suggestion, but unfortunately the result doesn't change. I have tried to produce the desired output concatenating two aggregation, the first one a term aggregation on location.value and for the second one I have tried sum aggregation, cardinality, and value_count, but I can't understand the results:

{
"_source": {
"includes": [ "organization.value", "organization.count" ],
"excludes": [ "text","vectterms", "sentence.sentence_text"]
},
"size": 0,
"aggs": {
"group_by_org": {
"terms": {
"field": "organization.value"
} ,
"aggs": {
"count_sum": {
"cardinality": {
"field": "organization.count"
}}}}}}

with cardinality aggregation, I have this strange output, where sometimes the value of the second aggregation is less than the value of doc_count - what is this value?:

"aggregations": {
"group_by_org": {
"doc_count_error_upper_bound": 2,
"sum_other_doc_count": 91,
"buckets": [
{
"key": "Consiglio comunale",
"doc_count": 7,
"count_sum": {
"value": 6
}
},
{
"key": "AA",
"doc_count": 5,
"count_sum": {
"value": 6
}
},
{
"key": "Cassazione",
"doc_count": 5,
"count_sum": {
"value": 6
}
},
{
"key": "Corte dei conti",
"doc_count": 5,
"count_sum": {
"value": 6
}
},
{
"key": "Metroweb",
"doc_count": 5,
"count_sum": {
"value": 6
}
},
{
"key": "Milan",
"doc_count": 5,
"count_sum": {
"value": 6
}
},
{
"key": "Minardi",
"doc_count": 5,
"count_sum": {
"value": 6
}

the other two aggregations tried, sum and value_count, return to me a very high value of this aggregation - In the same way of the previous query, replacing cardinality with sum and value_count, I think this is the number of all the organization entity in all document where appears that entity, but I can't understand well:

output of sum aggregation - I have only 9 documents in my index, and each entity return less than 10 times per document :

"buckets": [
{
"key": "Consiglio comunale",
"doc_count": 7,
"count_sum": {
"value": 332
}
},
{
"key": "AA",
"doc_count": 5,
"count_sum": {
"value": 301
}
},
{
"key": "Cassazione",
"doc_count": 5,
"count_sum": {
"value": 301
}
},
{
"key": "Corte dei conti",
"doc_count": 5,
"count_sum": {
"value": 301

output of value_count:

"buckets": [
{
"key": "Consiglio comunale",
"doc_count": 7,
"count_sum": {
"value": 105
}
},
{
"key": "AA",
"doc_count": 5,
"count_sum": {
"value": 80
}
},
{
"key": "Cassazione",
"doc_count": 5,
"count_sum": {
"value": 80
}
},
{
"key": "Corte dei conti",
"doc_count": 5,
"count_sum": {
"value": 80
}

Can someone help me?

system · January 19, 2017, 10:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Terms Aggregations Elasticsearch	11	670	July 6, 2017
Aggregating object keys to achieve a sum Elasticsearch	6	1281	March 8, 2022
Aggregate counting/sum query Elasticsearch	5	3638	March 3, 2019
Merge doc_count aggregation and sum Elasticsearch	1	833	April 6, 2018
Sum aggregation on "counter like" object field Elasticsearch	4	418	April 27, 2020

Sum distinct numeric values in object datatype

Related topics