Elasticsearch aggregation OOM


(Abhijith Reddy) #1

I am trying to debug an OOM issue that happens when we try to run some expensive aggregations.
The aggregation looks like this

{  
  "aggs":{  
    "name":{  
      "terms":{  
        "field":"name",
        "min_doc_count":2,
        "size":100,
        "collect_mode":"breadth_first"
      },
      "aggs":{  
        "distinct_groups":{  
          "cardinality":{ "field":"group_id" }
        },
        "default_groups":{  
          "filter":{  
            "term":{ "group_id":0 }
          }
        }
      }
    },
    "query":{  
      "filtered":{  
        "filter":{  
          "term":{  
            "id":123
          }
        }
      }
    },
    "size":0
  }
}

name is a string field (not analyzed) and group_id is an integer. Both fields can have a large cardinality.
This is the exception I see in the logs

Caused by: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: CircuitBreakingException[[request] Data too large, data for [<reused_arrays>] would be larger than limit of [12769768243/11.8gb]];
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:409)
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:113)
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:372)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:385)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
        at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: CircuitBreakingException[[request] Data too large, data for [<reused_arrays>] would be larger than limit of [12769768243/11.8gb]]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:97)
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:147)
        at org.elasticsearch.common.util.BigArrays.adjustBreaker(BigArrays.java:396)
        at org.elasticsearch.common.util.BigArrays.validate(BigArrays.java:433)
        at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:458)
        at org.elasticsearch.common.util.BigArrays.resize(BigArrays.java:475)
        at org.elasticsearch.common.util.BigArrays.grow(BigArrays.java:489)
        at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.ensureCapacity(HyperLogLogPlusPlus.java:197)
        at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:230)
        at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:203)
        at org.elasticsearch.search.aggregations.LeafBucketCollector$3.collect(LeafBucketCollector.java:73)
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80)
        at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$2.collect(GlobalOrdinalsStringTermsAggregator.java:130)
        at org.elasticsearch.search.aggregations.LeafBucketCollector.collect(LeafBucketCollector.java:88)
        at org.apache.lucene.search.MultiCollector$MultiLeafCollector.collect(MultiCollector.java:174)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:221)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:172)
        at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:821)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:384)
        ... 12 more

I am trying to understand how elasticsearch runs this aggregation under the hood, I am surprised that it fails in the inner cardinality aggregation since based on the documentation here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_pre_computed_hashes its supposed to take fixed memory and should be fast for integer fields.
I am running ES 2.4 with 32 GB of heap.
Any insights into this would be appreciated.

EDIT: This post here Out Of Memory error on cardinality aggregation seemed to have the exact same issue.


(Mark Harwood) #2

On a first glance at that stack trace it looks like it isn't using breadth_first collect mode and it's unclear why not.
Can you supply the mapping and a minimal example of a doc?


(Abhijith Reddy) #3

@Mark_Harwood Thanks for taking a look
Here is how the mapping looks like with irrelevant fields removed

{
  "items" : {
    "mappings" : {
      "item" : {
        "_all" : { "enabled" : false },
        "_source" : { "enabled" : false },
        "properties" : {
          "availability" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "name" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "group_id" : {
            "type" : "long"
          },       
          "id" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

And doc is pretty simple too

{
  "availability": "in stock",
  "name": "Nike Shoes",
  "id": 123,
  "group_id": 45
}

I am also seeing fielddata being used up for the name field which is strange since 2.x, doc values are enabled by default. Even a simple terms aggregations causes the fielddata to be consumed.


(Mark Harwood) #4

Thanks just tried that mapping/doc/query and it definitely uses breadth_first on the current master branch.
Are you sure that your example query is related to that stack trace? Can you supply the full JSON of the response you get back from that query?


(Abhijith Reddy) #5

@Mark_Harwood Unfortunately we time out after 10 seconds so we are not seeing anything in the response. I will try again with a longer timeout to see if I get anything.
Do you mind sharing how you verified it ?
Also do you happen to why field data is getting consumed for the name field ?


(Abhijith Reddy) #6

Actually I looked at the slow log and I found that the collect mode wasn't applied for one of the queries, which explains why it went OOM. Thanks a lot for your help.


(Mark Harwood) #7

As an es developer I am normally running elasticsearch inside an IDE so can use a debugger.

No - I'd ask you to verify that this is true too. Maybe you have a template that is controlling the mapping definition?


(Abhijith Reddy) #8

I've re-tested this on a separate cluster by deleting everything and recreating the mapping and still seems to be consuming field data. We have a fixed set of indexes and don't use templates too. Any ideas on how I can debug this further ?

Thanks


(Mark Harwood) #9

Can you post the results of the fielddata stats [1] and the results of a get mapping call [2]

[1] https://www.elastic.co/guide/en/elasticsearch/reference/2.4/cat-fielddata.html
[2] https://www.elastic.co/guide/en/elasticsearch/reference/2.4/indices-get-mapping.html


(Abhijith Reddy) #10

The mapping is exactly same as above.
Output from _cat API

id                     host                    ip               node    total    name    availability
MKS0ahJvSlC1uc_LVUWHAA fd2e:2a3b:cb3b::/48 fd2e:2a3b:cb3b::/48 node-1   10.9kb   5.7kb  5.2kb
hIH8z9KZT4uauQNiov2VDw fd7d:3919:b24::/48  fd2e:2a3b:cb3b::/48  node-2 1023.9kb 961.6kb 62.3kb
3FPigb-yQFKOrX51fcMTIg fd05:f1d:4bec::/48 fd2e:2a3b:cb3b::/48 node-3   13.6kb   7.4kb  6.2kb
cs9UMoOVQP63Ew22Qvq3Wg fda4:9435:64b2::/48  fd2e:2a3b:cb3b::/48  node-4    1.1mb     1mb 64.1kb
zEK41P2uSQahJ2tEkd5Gfw fdf9:9a5d:5be4::/48 2fd2e:2a3b:cb3b::/48 node-5   31.2kb  20.7kb 10.5kb
_YUIxEKRRdm3pM8syX5H-g fdc1:c57d:f25c::/48 fd2e:2a3b:cb3b::/48 node-6   24.3kb  13.9kb 10.3kb

Output from _stats/fielddata

{
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "fielddata" : {
        "memory_size_in_bytes" : 1217336,
        "evictions" : 0,
        "fields" : {
          "availability" : {
            "memory_size_in_bytes" : 82032
          },
          "name" : {
            "memory_size_in_bytes" : 1135304
          }
        }
      }
    },
    "total" : {
      "fielddata" : {
        "memory_size_in_bytes" : 2255536,
        "evictions" : 0,
        "fields" : {
          "name" : {
            "memory_size_in_bytes" : 2108144
          },
          "availability" : {
            "memory_size_in_bytes" : 147392
          }
        }
      }
    }
  },
  "indices" : { ... }
}

(Mark Harwood) #11

OK this minimal-reproduction of your environment on es 2.4.1 produces zero size field data usage. Can you repeat on your environment?

DELETE test
PUT test
{
  "settings":{
	  "number_of_shards": 1
  },
  "mappings": {
	  "item" : {
		"_all" : { "enabled" : false },
		"_source" : { "enabled" : false },
		"properties" : {
		  "availability" : {
			"type" : "string",
			"index" : "not_analyzed"
		  },
		  "name" : {
			"type" : "string",
			"index" : "not_analyzed"
		  },
		  "group_id" : {
			"type" : "long"
		  },       
		  "id" : {
			"type" : "long"
		  }
		}
	  }
	}
  }
POST test/item/1
{
  "availability": "in stock",
  "name": "Nike Shoes",
  "id": 123,
  "group_id": 45
}
GET test/_search
{  
  "aggs":{  
	"name":{  
	  "terms":{  
		"field":"name",
		"min_doc_count":1,
		"size":100,
		"collect_mode":"breadth_first"
	  },
	  "aggs":{  
		"distinct_groups":{  
		  "cardinality":{ "field":"group_id" }
		},
		"default_groups":{  
		  "filter":{  
			"term":{ "group_id":0 }
		  }
		}
	  }
	}
  },
	"query":{  
	  "bool":{  
		"filter":{  
		  "term":{  
			"id":123
		  }
		}
	  }
	},
	"size":0
  }
}
GET _stats/fielddata

(Abhijith Reddy) #12

@Mark_Harwood Appreciate your help ! I replicated the above setup exactly and the _cat/fielddata API reported no field data being consumed. However when I indexed few more documents (7 more in this case) I see one node showing 424b. Although it shouldn't make a different its worth mentioning that I am currently running 2.4.0 version.

  "version" : {
    "number" : "2.4.0",
    "build_hash" : "ce9f0c7394dee074091dd1bc4e9469251181fc55",
    "build_timestamp" : "2016-08-29T09:14:17Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  }

(Mark Harwood) #13

Hmm. I got a similar thing. Repeated additions saw the fielddata size drop to zero, presumably as a result of some merge operations.
It may be an oddity of the reporting logic.

Presumably on your large cluster you're not seeing very large (as in GBs) of fielddata usage?
This should give you docvalues sizes: GET /test/_stats/segments


(Abhijith Reddy) #14

Unfortunately we are, whenever we issue any aggregation in production it takes up to ~ 2GB on some nodes. It does drop back considerably after a few mins.
Also when I update each field in the mapping to have

 "fielddata": {
     "format": "disabled" 
 }

the aggregation fails with the error message

"Field data loading is forbidden on [name]"

This to me seems to suggest that fielddata is actually being used.

_stats/segments does show small values for doc_values_memory_in_bytes which is probably a sign that doc values are also being used.


(Abhijith Reddy) #15

Do you mind trying again by disabling fielddata on your setup and see if you get the same exception ? I just tried on the test index and it failed with the same exception.


(Mark Harwood) #16

Yep I get that too. Hmm.

However I am seeing docvalue segment files being created on disk using this http://stackoverflow.com/a/30528023/1408078


(Martijn Van Groningen) #17

I think the reason that you see field data usage being used and the search request fails when you disable field data, is that the terms aggregation uses global ordinals [1], and that is being kept around in field data cache. However global ordinals are very well compressed and thus should be small. How many unique values are in the name field?

Can you try setting the execution_hint to map on the terms aggregation and check if you can execute the search request without running into the error that field data loading has been disabled? This is just to check if what I'm saying is correct. In production using an execution mode that uses global ordinals results in general in better performance.

1: https://www.elastic.co/guide/en/elasticsearch/guide/2.x/preload-fielddata.html#global-ordinals


(Abhijith Reddy) #18

@mvg Thanks ! We have multiple indexes and for a given index there can be tens of millions of unique values.
You were right, when I try setting the execution_hit it doesn't consume any fielddata.


(Martijn Van Groningen) #19

That may be the case why field data usage is so high in your case. On its own this shouldn't be an issue. Do you know on how many shards you execute the search request? It maybe beneficial to reduce the number of shards, which can reduce the size global ordinals take in the field data cache across your cluster. Via the cat shards api you can check the number of shards exist for the indices you query and figure out per shard the global ordinals usage in field data cache. (assuming that name is the only field you're using now)

In case your data is time based data then in 5.0 there is a new shrink api [1] that makes reducing the number of shards trivial (without reindexing).

1: https://www.elastic.co/guide/en/elasticsearch/reference/5.1/indices-shrink-index.html#_monitoring_the_shrink_process


(Abhijith Reddy) #20

Thanks a lot for the details. I have a few more follow up questions

  1. Does the per field memory usage returned _cat/fielddata include global ordinals ?
  2. Since doc values is disk based, what does doc_values_memory_in_bytes give ?
  3. Is there a way to measure the space consumed by global ordinals ?
  4. Also I see conflicting things in the documentation related to fielddata and aggregations, [1] seems to suggest that fielddata wont be loaded but [2] says other wise. Is it possible to eagerly load global originals without triggering field data load.

[1] https://www.elastic.co/guide/en/elasticsearch/guide/2.x/preload-fielddata.html#CO224-1
[2] https://www.elastic.co/guide/en/elasticsearch/guide/2.x/preload-fielddata.html#CO223-1