I have an index that contains more than 1 million documents. I am using a query with the "collapse" parameter to group a field and get its unique values. Do I get ALL unique values from this field that might match the query, or could some be missing if there are more than 10,000 possible hits for the query due to the default limit of 10,000? How could I make sure to get all possible unique values?
I am new to ES, so I hope my question makes sense.
"collapse" only collapse the search results and affected by the size of query.
First, you can use pagination of the query by search_after (with PIT if necessary).
And as a better way, you may also use terms aggregation. See Size section in the doc how to get whole unique terms. Make sure you get "sum_other_doc_count": 0 to confirm you did not miss any documents.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.