Ah, interesting. Thanks for the response! Okay, that makes sense. I guess I just never noticed it outside of those aggregations, because few of the others make sense to run against full-text. Fair enough.
Is there a good work-around for this sort of thing? Is there another sort of aggregation that would help avoid this, of does FieldData have to be computed at the start, regardless of the aggs hierarchy? I'm just thinking out loud--and I haven't tried this--but I'm wondering, for example, whether a filter aggregation with even something as rudimentary as an ids
query would let me run it.
My business-case here is just to find the most commonly used words in this field, for documents that match a given query. I'd like to throw significant_terms
at it, since I think that would actually be more useful than just a terms
, but either way, I need a result that tells me "'bicycle' shows up in 20% of these documents, 'air' shows up in 15%, 'dirt' shows up in 2%, 'the' is in 99%," etc.. Accuracy is important as always, but it's not overwhelmingly critical; just being able to communicate "a lot of these documents talk about "bicycle" would be a good start.
Can you think of another way to get that data out?
I could just scroll through all the query's results and manually tokenize them application-side, but clearly that's not an awesome solution, since Elasticsearch already has that processing done. That brings me to the other question I posted, which would help me do this, but again, not an awesome solution, since I'm performing an aggregation application-side--lots of wasted bandwidth and time there.