Sum aggregation on "counter like" object field

fhdz · March 24, 2020, 4:33pm

Hi there,

I find myself with 'counter like' fields, e.g. for counting some occurrences of my docs in different domains.
(When a doc is used in a context of such or such domain, the counter is incremented.)
I have an object field "domains" which looks like this:

...
"domains": {
    "domain A": 17,
    "domain B": 8,
    "domain C": 3,
    ...
}
...

Is there a way to make a "sum" aggregation that would make the 'overall counter' on each of those keys?

(cc @Hammad_Ali_Khan who seems to had a similar issue a while back but no response.)

spinscale · March 24, 2020, 5:00pm

Hey,

a couple of things here. First this is not the best data modelling practice, as you would run into a class mapping explosion (probably want to check the docs for this), when you have a lot of different domains. You may want to have a document per domain and then use a filter to sum things up instead and have a field like count for all those. documents.

Right now, you could use a sum aggregation for each domain. If you would like to have a counter for all the values (in your example 17 + 8 + 3), then one way would be to have a sums field in the domain and add up those values at index time, and then run a sum aggregation on the sums field.

if speed is of no concern, you could use a script in the sum aggregation, that sums up all the fields (even though looking up all the different fields might be some work, I am not sure on top of my head). See https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-aggregations-metrics-sum-aggregation.html#_script_11

hope this helps as a start.

--Alex

fhdz · March 24, 2020, 5:15pm

Thanks Alex for your swift reply!
I think I should have given more details about the use case.

The idea is to store a lot of small documents for down the line NLP applications. Each document would have a few fields like "client", "language", "origin", etc. as well as the core text data.
These fields would allow to retrieve data and build corpora based on different filters on the different fields.

One of the fields is then related to the "domains", which can take a relatively fixed number of values (let's say two dozens).
There may be some new ones along the way but it should really not explode.
Finally, these domains are not mutually exclusive, hence the need to count for each doc where it stands.

So, as I would have few domains, but tens of millions of documents, I don't really see the benefit of inverting the structure. Not sure if any other structure would be interesting.

Anyways, as the domains are supposed to be pretty fixed, I can always do the aggregation per domain. Just thought it would be nice to have an existing entry point for such cases!

( + it would be probably easier to integrate in some Kibana visualisations)

spinscale · March 30, 2020, 1:58pm

ok, don't worry about the mapping explosion then.

Still having a total field summing up all the values makes sense to me.

Hope this helps!

--Alex

system · April 27, 2020, 1:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregate counting/sum query Elasticsearch	5	3497	March 3, 2019
Best approach to totals when aggregating many documents Elasticsearch	1	407	April 18, 2017
Aggregating object keys to achieve a sum Elasticsearch	6	1190	March 8, 2022
Aggregating by a given item in array Elasticsearch	5	501	October 16, 2019
Aggregation numbers do not add up Elasticsearch	1	314	February 6, 2022

Sum aggregation on "counter like" object field

Related topics