Circuit Breaking Exception

Hello,

we using Elasticsearch for a while and we current have few hundred indices with about 120 millions documents in each index.

From some days we often see warnings in elasticsearch logs:

org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<http_request>] would be…

and we can't search our indices.
Cat API (exactly: _cat/nodes?h=heap*&v) says heap is almost full.

Currently elasticsearch works on one machine. And we are little over "magic 32GB" java heap size for elasticsearch.

All indices must be searchable so we can't close it.

What can we do?

Which version of Elasticsearch are you using?

What is the full output of the cluster stats API?

Having the heap set above 32GB in size might actually result in less usable space compared to running with a smaller heap as you may no longer benefit from compressed pointers, so it may actually make sense making the heap smaller to get more space.

Thank you for reply.
We are using Elasticsearch 6.8.13.

I know having heap over 32GB we no longer benefit from compressed pointers but it is temporary solution for keep elastic works and collect new data.

The cluster consists one node.
For now we have 665 indices with about 2664 shards (4 shards per index) . We added more heap size to keep cluster 'green'.
Which exactly part of cluster stats API result you would like?

I would like to see the full output.

1 Like

That's bad. We recommend <600 shards per node.

Here is our cluster stats API output :

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "Cluster01",
  "cluster_uuid" : "vAlue-cHANged21201918",
  "timestamp" : 1609174432345,
  "status" : "green",
  "indices" : {
    "count" : 666,
    "shards" : {
      "total" : 2664,
      "primaries" : 2664,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 4,
          "max" : 4,
          "avg" : 4.0
        },
        "primaries" : {
          "min" : 4,
          "max" : 4,
          "avg" : 4.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 75203627952,
      "deleted" : 0
    },
    "store" : {
      "size_in_bytes" : 18473241627256
    },
    "fielddata" : {
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 1329521236,
      "total_count" : 24414,
      "hit_count" : 338,
      "miss_count" : 24076,
      "cache_size" : 1052,
      "cache_count" : 1277,
      "evictions" : 225
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 2727,
      "memory_in_bytes" : 37491386829,
      "terms_memory_in_bytes" : 28924025608,
      "stored_fields_memory_in_bytes" : 4348146464,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 349056,
      "points_memory_in_bytes" : 4193403681,
      "doc_values_memory_in_bytes" : 25462020,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 1,
      "data" : 1,
      "coordinating_only" : 0,
      "master" : 1,
      "ingest" : 1
    },
    "versions" : [
      "6.8.13"
    ],
    "os" : {
      "available_processors" : 24,
      "allocated_processors" : 24,
      "names" : [
        {
          "name" : "Linux",
          "count" : 1
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Ubuntu 18.04.5 LTS",
          "count" : 1
        }
      ],
      "mem" : {
        "total_in_bytes" : 126740238336,
        "free_in_bytes" : 3115945984,
        "used_in_bytes" : 123624292352,
        "free_percent" : 2,
        "used_percent" : 98
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 4
      },
      "open_file_descriptors" : {
        "min" : 22493,
        "max" : 22493,
        "avg" : 22493
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 953405537,
      "versions" : [
        {
          "version" : "1.8.0_275",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "25.275-b01",
          "vm_vendor" : "Private Build",
          "count" : 1
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 45221787064,
        "heap_max_in_bytes" : 64267485184
      },
      "threads" : 292
    },
    "fs" : {
      "total_in_bytes" : 21901650849792,
      "free_in_bytes" : 3425590816768,
      "available_in_bytes" : 3425574039552
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 1
      },
      "http_types" : {
        "security4" : 1
      }
    }
  }
}

My previous response still applies, too many shards for that single node.

So you suggest add more nodes?

We also consider reduce number shards per existing (and also new) node.
We think about two options:

  1. Currently we have 4 shards per index. Each shard's size is about 15GB. We would reduce it to 2 shards per index.
  2. We are using Elasticsearch to store logs data. New index is created every day. We would change index creation policy to new index per every two days. Our retention policy is 2 years (732 days).

Both options give us reduction shards per node on half and both give us not exceed 50GB per shard.

Which option is better in your opinion?

++

Or use ILM.

I would use ILM.

Sorry but I don't understand what "++" mean.
Are you mean merging both options? It is not looks good because it gives us shards over 50GB.
Currently we have 60GB average index size (4 shards 15GB every) if we reduce shard numbers as in option 1 we will have 2 shards 30GB every. If we also change index creation policy to create new index every two days (as in option 2) we will reduce number of indices on cluster by half but average index size will be 120GB (divided by 2 shards give us 60GB every shards).

We would to know which option (less shards per index or less index per cluster) is better from RAM ocuppacy and storage perspective.

Could you tell me for what would you use ILM?

Ahh sorry, I meant yes that's a good idea.

Ultimately they should both be the same thing, because they will results in less shards.

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html might be a better link to understand what ILM does.