Generate Aggregation List for Large Index


(Kevin M.) #1

I am trying to get all unique values in a given field using the following terms aggregations and it is returning a "can't communicate with server error" but I no that is not the actual issue because if I drop the size value to 2000000 it works:

{
"size": 0,
"aggs":{
"names":{
"terms": { "field": "name",
"size": 4000000
}
}
}
} > myResults.txt

I am using Elastic Cloud with 16GB of RAM and 384GB of disk space and I believe the cluster just can't handle the larger number of results. Is there anyway to get all 4 million unique values out that I need for post processing? Any help that anyone could provide would be appreciated. Thanks.

Kevin


(Mark Harwood) #2

In the next release (5.2) we have support for partitioning terms into an arbitrary number of sets and working with one set at a time. See https://www.elastic.co/guide/en/elasticsearch/reference/5.x/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions


(Kevin M.) #3

Thanks for the info. In the interim, is there a way I can do separate queries and just merge the results? Maybe add a filter query to just do names starting with A-L and then another starting with M-Z? Not sure how to use REGEX in an ES query though. Please let me know if this would be doable. Thanks.

Kevin


(Kevin M.) #4

This problem is solved, I was in fact able to use a REGEX filter and run two separate queries to do what I needed and here is what the latter query looks like:

{
    "size": 0,
    "aggs" : {
        "names" : {
            "filter" : { "regexp": { "name": "[m-zM-Z].*" } },
            "aggs" : {
                "filteredNames" : { "terms": {"field" : "name", "size": 2000000} }
            }
        }
    }
}

Just wanted to close the loop on this issue. Thanks.

Kevin


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.