Query Failures from Data Too Large and High Memory Pressure


(Kelly Cheng) #1

We recently spun up a cluster on Found elastic search and have been experiencing a high number of errors. After running our index for a while, we noticed two things:

  1. Our memory pressure is essentially pegged at 99% on two out of three nodes

    • node1: 99%
    • node2: 99%
    • node3: 19%
  2. Almost all queries are now failing with an error about a field having data size too large:

ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [name.name_sortable] would be larger than limit of [2508954009/2.3gb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [name.name_sortable] would be larger than limit of [2508954009/2.3gb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [name.name_sortable] would be larger than limit of [2508954009/2.3gb]]; }{[AB082J7RSqG2KV7Zu3zaSg][ali-production][3]: RemoteTransportException[[instance-0000000001][inet[/172.17.0.7:19326]][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[[ali-production][3]: query[filtered(filtered(+(phone_numbers.phone_numbers_prefix:cb
phone_numbers.phone_numbers_literal:CB
name:cb
email_domain:CB))->+cache(company_id:\b\u0000\u0000=\u001C) +NotFilter(cache(is_deleted:T)) +cache(_type:organization))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@eed9ba48)],from[0],size[3],sort [<custom:\"name.name_sortable\": org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource@75cbfa>!,<custom:\"entity_id\": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSou rce@449c175f>!]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [name.name_sortable] would b e larger than limit of [2508954009/2.3gb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [name.name_sortable] would be larger than limit of [2508954009/2.3gb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [name.name_sortable] would be larger than limit of [2508954009/2.3gb]]; }{[AB082J7RSqG2KV7Zu3zaSg][ali-production][4]: RemoteTransportException[[instance-0000000001][inet[/172.17.0.7:19326]][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[[ali-production][4]: query[filtered(filtered(+(phone_numbers.phone_ numbers_prefix:cb phone_numbers.phone_numbers_literal:CB name:cb email_domain:CB))->+cache(company_id:\b\u0000\u0000=\u001C) +NotFilter(cache(is_deleted:T)) +cache(_type:organization))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@eed9ba48)],from[0],size[3],sort
[<custom:"name.name_sortable": org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource@1dbcc2c0>!,<custom:"entity_id": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorS
ource@22b3649>!]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [entity_id] would be larger
than limit of [2508954009/2.3gb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [entity_id] would be larger than limit of [2508954
009/2.3gb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [entity_id] would be larger than limit of [2508954009/2.3gb]]; }]","status":500}, /app/vendor/bundle/ruby/2.1.0/gems/elasticsearch-tran
sport-1.0.12/lib/elasticsearch/transport/transport/base.rb:135:in __raise_transport_error' /app/vendor/bundle/ruby/2.1.0/gems/elasticsearch-transport-1.0.12/lib/elasticsearch/transport/transport/base.rb:227:inperform_request'

We don't believe this particular setup has a remarkably large data set as we have a parallel cluster hosted elsewhere, but we can't seem to figure out how to solve this issue. The "entity_id" column mentioned in the error is an integer field, so I'm not sure how it could get to be larger than 2.3 gb.

The stats to our index don't seem to be particularly interesting, but I can provide them if they can provide any clues. Any help would be appreciated.


(system) #2