I find myself with 'counter like' fields, e.g. for counting some occurrences of my docs in different domains.
(When a doc is used in a context of such or such domain, the counter is incremented.)
I have an object field "domains" which looks like this:
a couple of things here. First this is not the best data modelling practice, as you would run into a class mapping explosion (probably want to check the docs for this), when you have a lot of different domains. You may want to have a document per domain and then use a filter to sum things up instead and have a field like count for all those. documents.
Right now, you could use a sum aggregation for each domain. If you would like to have a counter for all the values (in your example 17 + 8 + 3), then one way would be to have a sums field in the domain and add up those values at index time, and then run a sum aggregation on the sums field.
Thanks Alex for your swift reply!
I think I should have given more details about the use case.
The idea is to store a lot of small documents for down the line NLP applications. Each document would have a few fields like "client", "language", "origin", etc. as well as the core text data.
These fields would allow to retrieve data and build corpora based on different filters on the different fields.
One of the fields is then related to the "domains", which can take a relatively fixed number of values (let's say two dozens).
There may be some new ones along the way but it should really not explode.
Finally, these domains are not mutually exclusive, hence the need to count for each doc where it stands.
So, as I would have few domains, but tens of millions of documents, I don't really see the benefit of inverting the structure. Not sure if any other structure would be interesting.
Anyways, as the domains are supposed to be pretty fixed, I can always do the aggregation per domain. Just thought it would be nice to have an existing entry point for such cases!
( + it would be probably easier to integrate in some Kibana visualisations)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.