I wanted to have a count of
room: 3
kitchen: 1
restroom: 1
I am looking for an aggregation in which I can subaggregate summing the count field. I tried the example below, but I am summing the count field without the topic scope:
I think you should reorganize your document structure. It's not intuitive the way you have it.
What you might want is to flatten out the document to be something like this:
{
"topic":"kitchen",
"count":1,
"xyz": "John" // a field with uniq id to tie multiple documents together
}
In your above example first document ("_id":1), you would have 2 documents
{ "topic":"room","count":2, "xyz":"customer1" },
{ "topic":"kitchen","count":1, "xyz":"customer1" }
for document ("_id":2), you would also have 2 documents
{ "topic":"room","count":1, "xyz":"customer2" },
{ "topic":"restroom","count":1, "xyz":"customer2" }
This way, you can eliminate the array list. By searching for the new field "xyz" you get the array list equivalent.
The benefit of flattening your document structure is to make aggregation a lot easier.
But I am not interested in the customer info when I am aggregating. I only want to know the count of the keyword repetitions inside the aggregation. So, if I have 1000 documents in one aggregation, I would count the number of times "room" appeared, also "kitchen" and 50 other keys.
I have a possible solution here but it's taking a looong time and I am trying to achieve a faster aggregation.
I think your "possible solution" is better. This kind of aggregation should be very fast.
How long is long for you?
Another possible speed up is index sorting. It sorts the documents based on the fields so it could skip files during search/aggregation.
Let me point out one problem of this mapping. You have to use nested fields for "opinions" to keep topic and count linked. Arrays of object is flattened internally.
And also aggregation query should be changed accordingly.
One alternative way is use the first aggregation of the previous topic and use transform to do the aggregation backgroud periodicaly.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.