Generate Aggregation List for Large Index

I am trying to get all unique values in a given field using the following terms aggregations and it is returning a "can't communicate with server error" but I no that is not the actual issue because if I drop the size value to 2000000 it works:

{
"size": 0,
"aggs":{
"names":{
"terms": { "field": "name",
"size": 4000000
}
}
}
} > myResults.txt

I am using Elastic Cloud with 16GB of RAM and 384GB of disk space and I believe the cluster just can't handle the larger number of results. Is there anyway to get all 4 million unique values out that I need for post processing? Any help that anyone could provide would be appreciated. Thanks.

Kevin

In the next release (5.2) we have support for partitioning terms into an arbitrary number of sets and working with one set at a time. See https://www.elastic.co/guide/en/elasticsearch/reference/5.x/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

Thanks for the info. In the interim, is there a way I can do separate queries and just merge the results? Maybe add a filter query to just do names starting with A-L and then another starting with M-Z? Not sure how to use REGEX in an ES query though. Please let me know if this would be doable. Thanks.

Kevin

This problem is solved, I was in fact able to use a REGEX filter and run two separate queries to do what I needed and here is what the latter query looks like:

{
    "size": 0,
    "aggs" : {
        "names" : {
            "filter" : { "regexp": { "name": "[m-zM-Z].*" } },
            "aggs" : {
                "filteredNames" : { "terms": {"field" : "name", "size": 2000000} }
            }
        }
    }
}

Just wanted to close the loop on this issue. Thanks.

Kevin

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.