Aggregation with filters causing OutOfMemoryError


(Kufi) #1

Hello

We currently have a problem with one of our queries, but only in specific circumstances. Basically, the query runs without problems as long as the amount of filters is not really big. This is ok for most of our queries, but one specific query generates around 3000 filter buckets which cause an OutOfMemoryError. Lower numbers of buckets are okay and also perform fast enough on the amount of documents we currently store.

Is there a way to prevent this?

The query looks like this:

{
    {
        "filtered": {
            "filter": {
                "and": [
                    {
                        "nested": {
                            "path": "tags",
                            "filter": {
                                ...
                            }
                        }
                    },
                    {
                        "term": {
                            "field": value
                        }
                    }
                ]
            }
        }
    },
    "aggs": {
        "distribution": {
            "filters": {
                "filters": {
                    "filter_one": {
                        "nested": {
                            "path": "tags",
                            "filter": {
                                ...
                            }
                        }
                    },
                    ...more filters here, same structure as above. If too many filter buckets present, the query crashes
                }
            },
            "aggs": {
                "timeline": {
                    "nested": {
                        "path": "nested"
                    },
                    "aggs": {
                        "timeline_filter": {
                            "filter": {
                                ...
                            },
                            "aggs": {
                                "timeline_histogram": {
                                    "date_histogram": {
                                        "field": "nested.created_at",
                                        "interval": "day"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

(Mark Walkom) #2

Add more heap or reduce your bucket count is about it unfortunately.


(Kufi) #3

Ok. Good to know, because I ran out of ideas besides the "more power" route.

I'm thinking about splitting the buckets up into multiple requests and patching the results together after that, as the end results itself are quite easy to combine.


(Mark Walkom) #4

That's definitely another viable option if you can.


(system) #5