I have a Centos7 5 node cluster. I have a proof of concept advanced search gui (.Net/Nest) with a critical requirement of filling in a list box with the existing values (at least a 1000 values) of one of the 20 or 30 fields in the index.
I used the terms aggregation and this worked great when we had up to 25 million documents, with the added benefit of listing the counts also. The performance, however, is slowing down significantly as I've grown the index to 250 million (and have another index growing to 2.5 Billion now).
It seems, an obvious low hanging fruit, to me, that an inverted index architecture should be able to very quickly return the unique values for a field, but in all the searches I've done on the web, they only point to the terms aggregation which counts also, and is not all that fast.
I even found a response that showed "select distinct(color) from someindex" which is NOT the same as a terms aggregation (that sql returns only the list of unique values, not the values and their counts which would be select color, count(*) from someindex.
Thanks, Joe R.