Automated queries to .kibana_task_manager cause CircuitBreakerException

We have a 7.4.2 Elasticsearch cluster with Kibana. We keep getting the following messages in our log files.

{
  "type": "server",
  "timestamp": "2020-04-14T00:00:10,274Z",
  "level": "DEBUG",
  "component": "o.e.a.s.TransportSearchAction",
  "cluster.name": "OUR_CLUSTER_NAME",
  "node.name": "es-coordinating-node1",
  "message": "[.kibana_task_manager_2][0], node[Cg2AmralQPCN9Szrg07bIw], [R], s[STARTED], a[id=zxo_7v7yRVmKYrdF69zgpw]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[.kibana_task_manager], indicesOptions=IndicesOptions[ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={\"query\":{\"bool\":{\"must\":[{\"term\":{\"type\":{\"value\":\"task\",\"boost\":1.0}}},{\"bool\":{\"filter\":[{\"term\":{\"_id\":{\"value\":\"task:Maps-maps_telemetry\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"sort\":[{\"task.runAt\":{\"order\":\"asc\"}},{\"_id\":{\"order\":\"desc\"}}]}}]",
  "cluster.uuid": "BYsSqcOITleJZYD1yOhbGw",
  "node.id": "YCCsp7goTYeeQuFerwmaJA",
  "stacktrace": [
    "org.elasticsearch.transport.RemoteTransportException: [es-data-node1][10.0.1.170:9300][indices:data/read/search[phase/query]]",
    "Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: Query Failed [Failed to execute main query]",
    "at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:305) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:113) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:335) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:355) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:340) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.search.SearchService.lambda$rewriteShardRequest$7(SearchService.java:1043) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionRunnable$1.doRun(ActionRunnable.java:45) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.4.2.jar:7.4.2]",
    "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]",
    "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]",
    "at java.lang.Thread.run(Thread.java:830) [?:?]",
    "Caused by: org.elasticsearch.ElasticsearchException: java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [12819847882/11.9gb], which is larger than the limit of [11139022848/10.3gb]]",
    ...
    Trace omitted for post character limit
  ]
}
{
  "type": "server",
  "timestamp": "2020-04-14T00:00:10,359Z",
  "level": "WARN",
  "component": "r.suppressed",
  "cluster.name": "OUR_CLUSTER_NAME",
  "node.name": "es-coordinating-node1",
  "message": "path: /.kibana_task_manager/_search, params: {ignore_unavailable=true, index=.kibana_task_manager}",
  "cluster.uuid": "BYsSqcOITleJZYD1yOhbGw",
  "node.id": "YCCsp7goTYeeQuFerwmaJA",
  "stacktrace": [
    "org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",
    "at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:314) [elasticsearch-7.4.2.jar:7.4.2]",
    ...
    Trace omitted for post character limit
  ]
}
{
  "type": "server",
  "timestamp": "2020-04-14T00:00:10,357Z",
  "level": "DEBUG",
  "component": "o.e.a.s.TransportSearchAction",
  "cluster.name": "OUR_CLUSTER_NAME",
  "node.name": "es-coordinating-node1",
  "message": "All shards failed for phase: [query]",
  "cluster.uuid": "BYsSqcOITleJZYD1yOhbGw",
  "node.id": "YCCsp7goTYeeQuFerwmaJA",
  "stacktrace": [
    "org.elasticsearch.common.breaker.CircuitBreakingException: [fielddata] Data too large, data for [_id] would be [12819847882/11.9gb], which is larger than the limit of [11139022848/10.3gb]",
    "at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:98) ~[elasticsearch-7.4.2.jar:7.4.2]",
    ...
    Trace omitted for post character limit
  ]
}

And this message is from our kibana logs:

{"type":"log","@timestamp":"2020-04-28T13:13:36Z","tags":["warning","stats-collection"],"pid":25382,"message":"Unable to fetch data from maps collector"}
{
  "type": "error",
  "@timestamp": "2020-04-28T13:13:36Z",
  "tags": [
    "warning",
    "stats-collection"
  ],
  "pid": 25382,
  "level": "error",
  "error": {
    "message": "[circuit_breaking_exception] [fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb], with { bytes_wanted=11140489384 & bytes_limit=11139022848 & durability=\"PERMANENT\" }",
    "name": "Error",
    "stack": "[circuit_breaking_exception] [fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb], with { bytes_wanted=11140489384 & bytes_limit=11139022848 & durability=\"PERMANENT\" } :: {\"path\":\"/.kibana_task_manager/_search\",\"query\":{\"ignore_unavailable\":true},\"body\":\"{\\\"sort\\\":[{\\\"task.runAt\\\":\\\"asc\\\"},{\\\"_id\\\":\\\"desc\\\"}],\\\"query\\\":{\\\"bool\\\":{\\\"must\\\":[{\\\"term\\\":{\\\"type\\\":\\\"task\\\"}},{\\\"bool\\\":{\\\"filter\\\":{\\\"term\\\":{\\\"_id\\\":\\\"task:oss_telemetry-vis_telemetry\\\"}}}}]}}}\",\"statusCode\":500,\"response\":\"{\\\"error\\\":{\\\"root_cause\\\":[{\\\"type\\\":\\\"circuit_breaking_exception\\\",\\\"reason\\\":\\\"[fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb]\\\",\\\"bytes_wanted\\\":11140489384,\\\"bytes_limit\\\":11139022848,\\\"durability\\\":\\\"PERMANENT\\\"}],\\\"type\\\":\\\"search_phase_execution_exception\\\",\\\"reason\\\":\\\"all shards failed\\\",\\\"phase\\\":\\\"query\\\",\\\"grouped\\\":true,\\\"failed_shards\\\":[{\\\"shard\\\":0,\\\"index\\\":\\\".kibana_task_manager_2\\\",\\\"node\\\":\\\"ujE6xO4IRfWNNkR4fUF6Wg\\\",\\\"reason\\\":{\\\"type\\\":\\\"exception\\\",\\\"reason\\\":\\\"java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb]]\\\",\\\"caused_by\\\":{\\\"type\\\":\\\"execution_exception\\\",\\\"reason\\\":\\\"execution_exception: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb]]\\\",\\\"caused_by\\\":{\\\"type\\\":\\\"circuit_breaking_exception\\\",\\\"reason\\\":\\\"[fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb]\\\",\\\"bytes_wanted\\\":11140489384,\\\"bytes_limit\\\":11139022848,\\\"durability\\\":\\\"PERMANENT\\\"}}}}],\\\"caused_by\\\":{\\\"type\\\":\\\"circuit_breaking_exception\\\",\\\"reason\\\":\\\"[fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb]\\\",\\\"bytes_wanted\\\":11140489384,\\\"bytes_limit\\\":11139022848,\\\"durability\\\":\\\"PERMANENT\\\"}},\\\"status\\\":500}\"}\n    at respond (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:349:15)\n    at checkRespForFailure (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:306:7)\n    at HttpConnector.<anonymous> (/usr/share/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:173:7)\n    at IncomingMessage.wrapper (/usr/share/kibana/node_modules/elasticsearch/node_modules/lodash/lodash.js:4929:19)\n    at IncomingMessage.emit (events.js:194:15)\n    at endReadableNT (_stream_readable.js:1103:12)\n    at process._tickCallback (internal/process/next_tick.js:63:19)"
  },
  "message": "[circuit_breaking_exception] [fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb], with { bytes_wanted=11140489384 & bytes_limit=11139022848 & durability=\"PERMANENT\" }"
}

The messages started coming in on April 14th (we only noticed recently because the node was running low on disk space), but we hadn't taken any special action on the 14th that would have caused the error. The cluster continuously takes in data, but all of those processes have been running smoothly. The cluster is all healthy, all statuses green and everything.

Stopping Kibana stopped the errors from getting logged in our Elasticsearch logs (and also in the kibana logs, naturally). Restarting Kibana freed up a lot of disk space, but didn't fix the errors.

From what I read in the logs "something" (I think an internal ES process - o.e.a.s.TransportSearchAction?) queries .kibana_task_manager with a query like

GET /.kibana_task_manager/_search
{
  "sort": [
        {
          "task.runAt": "asc"
        },
        {
          "_id": "desc"
        }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "type": "task"
          }
        },
        {
          "bool": {
            "filter": {
              "term": {
                "_id": "task:oss_telemetry-vis_telemetry"
              }
            }
          }
        }
      ]
    }
  }
}

to find "task:oss_telemetry-vis_telemetry", and then this query causes a CircuitBreakerException. (with the following reasons, when I run the query manually:
either

java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [12674571573/11.8gb], which is larger than the limit of [11139022848/10.3gb]]

or

java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb]]

). The exception goes away when I run the query without the sort on "_id", but I have no idea why that would cause an issue - .kibana_task_manager only has 2 small documents. And besides that, we aren't the ones who set off the query, so we don't know of a way to edit it.

My general question is: how do I stop these errors from occurring? I figure we could increase the circuit limit, but that seems like the wrong way to go about things - this query should not come close to the limit, and those limits seem to be there for good reasons. My specific questions which I think might help are: Why is sort causing CircuitBreakerException? and what is making the query?

Hello,

The error message is:

java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [11140489384/10.3gb], which is larger than the limit of [11139022848/10.3gb]]

It means the JVM Heap of the node is using 10GB just to keep the global ordinals generated from the sorting on _id.

They show up in the fielddata stats, but they're actually global ordinals.

More details are in our documentation

We're moving towards disabling fielddata for _id. It will be disabled by default in 8.0.

Can you please show us the output of the following commands and share them?

  • GET _cat/fielddata?v
  • GET .kibana_task_manager/_stats/fielddata

Once you grabbed the output, you can get the cluster functional again running:
POST */_cache/clear?fielddata=true&fields=_id.

Do not sort on _id. Please rely on a numerical field when sorting if possible.

We are aware the Telemetry was sorting on _id, but I doubt it generated 10GB of Fielddata. It is for sure coming from other indices.

We've improved this and will be relased in 7.7:

The error you see in the task manager is just the "last drop".

Thanks so much for all the info, so basically what happened is that sorting on _id on other indicies filled up this "fielddata" cache? And to prevent this in the future we need to never sort on _id?

Here's the output of those commands:

GET _cat/fielddata?v

field names changed for privacy

id                     host       ip         node           field                       size
ta5GGev6SgaMHquv5wMJ5w 10.0.1.98  10.0.1.98  es-data-node9  FIELD_A                    3.4gb
ta5GGev6SgaMHquv5wMJ5w 10.0.1.98  10.0.1.98  es-data-node9  kibana_stats.kibana.uuid   1.1kb
ta5GGev6SgaMHquv5wMJ5w 10.0.1.98  10.0.1.98  es-data-node9  kibana_stats.kibana.status 1.1kb
ta5GGev6SgaMHquv5wMJ5w 10.0.1.98  10.0.1.98  es-data-node9  type                         1kb
ta5GGev6SgaMHquv5wMJ5w 10.0.1.98  10.0.1.98  es-data-node9  FIELD_B                    8.2gb
S8btXaAAR0C4pPY_nc0PCw 10.0.1.138 10.0.1.138 es-data-node7  FIELD_B                    8.2gb
S8btXaAAR0C4pPY_nc0PCw 10.0.1.138 10.0.1.138 es-data-node7  FIELD_A                    3.6gb
S8btXaAAR0C4pPY_nc0PCw 10.0.1.138 10.0.1.138 es-data-node7  jobtype                     376b
S8btXaAAR0C4pPY_nc0PCw 10.0.1.138 10.0.1.138 es-data-node7  meta.layout.keyword         376b
S8btXaAAR0C4pPY_nc0PCw 10.0.1.138 10.0.1.138 es-data-node7  status                      376b
S8btXaAAR0C4pPY_nc0PCw 10.0.1.138 10.0.1.138 es-data-node7  meta.objectType.keyword     376b
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  meta.objectType.keyword    1.7kb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  kibana_stats.kibana.status    0b
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  jobtype                    1.7kb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  log.level                  1.1kb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  event.dataset              1.6kb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  FIELD_A                    2.9gb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  FIELD_B                      7gb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  status                     1.7kb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  meta.layout.keyword        1.7kb
sJRMSzfrTvu6I3Abou9mwg 10.0.1.63  10.0.1.63  es-data-node8  kibana_stats.kibana.uuid      0b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 meta.layout.keyword         728b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 jobtype                     728b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 source_node.uuid            992b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 status                      728b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 FIELD_A                    2.7gb
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 kibana_stats.kibana.status  400b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 kibana_stats.kibana.uuid    400b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 source_node.name            992b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 shard.state                1.3kb
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 FIELD_B                    6.9gb
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 shard.node                  376b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 meta.objectType.keyword     728b
u15vdP2lSA-UsLtY0ainKg 10.0.1.58  10.0.1.58  es-data-node-6 shard.index                3.4kb
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  meta.layout.keyword         376b
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  event.dataset              1.6kb
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  meta.objectType.keyword     376b
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  log.level                  1.1kb
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  status                      376b
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  FIELD_A                      3gb
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  FIELD_B                    8.2gb
_wvYwE2VSNmwN3nLmWhz7w 10.0.1.153 10.0.1.153 es-data-node5  jobtype                     376b
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  kibana_stats.kibana.status 1.5kb
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  source_node.uuid            544b
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  shard.index                 592b
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  shard.node                  400b
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  shard.state                 400b
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  source_node.name            544b
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  FIELD_A                    2.9gb
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  kibana_stats.kibana.uuid   1.5kb
8PQuDU-MTTWvt7X99zQDjA 10.0.1.156 10.0.1.156 es-data-node3  FIELD_B                    7.7gb
jtLEhVFVRqmHEN1u6PFzyQ 10.0.1.28  10.0.1.28  es-data-node10 meta.layout.keyword         352b
jtLEhVFVRqmHEN1u6PFzyQ 10.0.1.28  10.0.1.28  es-data-node10 FIELD_B                    6.9gb
jtLEhVFVRqmHEN1u6PFzyQ 10.0.1.28  10.0.1.28  es-data-node10 status                      352b
jtLEhVFVRqmHEN1u6PFzyQ 10.0.1.28  10.0.1.28  es-data-node10 FIELD_A                    3.2gb
jtLEhVFVRqmHEN1u6PFzyQ 10.0.1.28  10.0.1.28  es-data-node10 meta.objectType.keyword     352b
jtLEhVFVRqmHEN1u6PFzyQ 10.0.1.28  10.0.1.28  es-data-node10 jobtype                     352b
Cg2AmralQPCN9Szrg07bIw 10.0.1.170 10.0.1.170 es-data-node1  _id                           0b
Cg2AmralQPCN9Szrg07bIw 10.0.1.170 10.0.1.170 es-data-node1  jobtype                     752b
Cg2AmralQPCN9Szrg07bIw 10.0.1.170 10.0.1.170 es-data-node1  FIELD_A                    3.3gb
Cg2AmralQPCN9Szrg07bIw 10.0.1.170 10.0.1.170 es-data-node1  FIELD_B                    8.1gb
Cg2AmralQPCN9Szrg07bIw 10.0.1.170 10.0.1.170 es-data-node1  meta.layout.keyword         752b
Cg2AmralQPCN9Szrg07bIw 10.0.1.170 10.0.1.170 es-data-node1  status                      752b
Cg2AmralQPCN9Szrg07bIw 10.0.1.170 10.0.1.170 es-data-node1  meta.objectType.keyword     752b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  status                     1.4kb
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  meta.objectType.keyword    1.4kb
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  source_node.name            448b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  kibana_stats.kibana.uuid    376b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  meta.layout.keyword        1.4kb
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  shard.state                 376b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  FIELD_B                    8.4gb
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  kibana_stats.kibana.status  376b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  FIELD_A                    3.5gb
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  shard.node                  376b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  source_node.uuid            448b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  shard.index                 568b
7wZvyulPSW-0qz9fUPRuhw 10.0.1.160 10.0.1.160 es-data-node2  jobtype                    1.4kb
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 FIELD_A                    2.5gb
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 shard.index                4.2kb
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 jobtype                     776b
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 FIELD_B                    6.9gb
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 status                      776b
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 meta.objectType.keyword     776b
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 shard.node                  776b
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 meta.layout.keyword         776b
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 kibana_stats.kibana.status  376b
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 shard.state                1.7kb
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 kibana_stats.kibana.uuid    376b
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 source_node.name           1.2kb
jEquM5YxQN2f6ANXy-LtCA 10.0.1.187 10.0.1.187 es-data-node11 source_node.uuid           1.2kb
ujE6xO4IRfWNNkR4fUF6Wg 10.0.1.118 10.0.1.118 es-data-node4  kibana_stats.kibana.uuid      0b
ujE6xO4IRfWNNkR4fUF6Wg 10.0.1.118 10.0.1.118 es-data-node4  _id                           0b
ujE6xO4IRfWNNkR4fUF6Wg 10.0.1.118 10.0.1.118 es-data-node4  kibana_stats.kibana.status    0b
ujE6xO4IRfWNNkR4fUF6Wg 10.0.1.118 10.0.1.118 es-data-node4  type                         1kb
ujE6xO4IRfWNNkR4fUF6Wg 10.0.1.118 10.0.1.118 es-data-node4  FIELD_B                    6.9gb
ujE6xO4IRfWNNkR4fUF6Wg 10.0.1.118 10.0.1.118 es-data-node4  FIELD_A                    3.1gb

and

GET .kibana_task_manager/_stats/fielddata
{
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      }
    },
    "total" : {
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      }
    }
  },
  "indices" : {
    ".kibana_task_manager_2" : {
      "uuid" : "ppXaGokwRk6_rQF3m214Cg",
      "primaries" : {
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        }
      },
      "total" : {
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        }
      }
    }
  }
}

You can impose a limit to the fielddata cache but it will generate overhead as you will probably have thrashing.

Is there a numerical value which you can use to sort your documents?

I didn't think we were using _id to sort documents, but I'm not completely sure of that. If that's the issue I can definitely look into fixing that.

I ran

POST */_cache/clear?fielddata=true&fields=_id

but that didn't stop the errors from coming in - do I need to see which kind of fielddata is taking up space and clear that from the cache? Do I need to clear the caches for FIELD_A and FIELD_B?

You are correct @toneyalexander

I initially supposed there was some sorting done on _id, but actually the major contributors are FIELD_A and FIELD_B.

You have to clear the fields using fielddata (you can even call POST */_cache/clear?fielddata=true to clear all fields).

For sure there is some term aggregation or sorting done on FIELD_A and FIELD_B, which makes it build global_ordinals.

Great thanks - do you know how we can prevent the aggregations/sorts from building global_ordinals too much? FIELD_A and FIELD_B are types of ID strings, so there is a large number of different values that are shared between multiple documents.

Yes they are high cardinality fields.

If they are unique values, there is no reason to perform a terms aggregation on them.

If those IDs can be indexed as numerical or hashed to a numerical value, it would be great as numeric fields are not affected by global ordinals.

They're not fully unique, but it should be possible for us to hash them to numerical values. I cleared the cache for one of the fields and that stopped the errors. Thanks so much for all of the help!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.