Hi!
I am using a terms aggregation to get the 10 best terms that match a query.
The problem is that since I am performing a query that returns a lot of
documents, the number of distinct terms is very big, as it is the number of
documents per bucket. An example would be:
{
"aggregations" : {
... // filters
"not_exact" : {
"doc_count" : 2257428,
"text" : {
"buckets" : [ {
"key" : "abb",
"doc_count" : 135686
}, {
"key" : "ansprache",
"doc_count" : 118570
}, {
"key" : "aus",
"doc_count" : 106023
}, {
"key" : "auf",
"doc_count" : 74338
}, {
"key" : "archiv",
"doc_count" : 54315
}, {
"key" : "außen",
"doc_count" : 52444
}, {
"key" : "am",
"doc_count" : 52178
}, {
"key" : "ab",
"doc_count" : 45723
}, {
"key" : "an",
"doc_count" : 44656
}, {
"key" : "athen",
"doc_count" : 32070
} ]
},
...
}
I am not interested in the actual number of documents, and I would even be
willing to sacrifice precision if I can speed up the query (which now takes
6 seconds), so my question is: is there a way to tell the terms aggregation
to stop counting at a certain limit? Imagine, for instance I could specify
this value to be 50 000. I could get the top buckets in the wrong order,
but I could live with that. Elasticsearch would take less time, I suppose.
I would be even happy if the limit was set to 10 000 and I would end up
with different keys, because as the user specifies more, values above 10
000 become less and less probable.
And if that is possible, the next question would be: is there a way of
making this value dependable on the number of total matches (in that case 2
257 428)?
For those who are interested: the background of this request is an
autocompletion request that matches against different fields, with ngram or
edge_ngram depending on the field type, and returns the best matches. In
the example above, the user types "a" and gets those results. If you are
thinking about caching results, this is generally not possible, since the
query is constrained by document types and fields; data chages; and
finally, each user can have a different view on it (so, a user may not see
a any documents containing "athen" and then that match would be incorrect).
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1278cc4c-83b6-4019-b7bf-17a1aae45e0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.