Filebeat doesn't sent the logs to elastic cloud (circuit_breaking_exception)

Hi!

I don't know why I get this error (agents filebeat in kubernetes cluster), if we increased the size of the cluster so that it had more performance and changed the indexes so that instead of 1 shard and 1 replica they had 2 shards and 1 replica so that it was optimal, but now it works worse than before.

This is the error I see in filebeat pod.

2020-07-29T12:37:35.671Z	ERROR	[elasticsearch]	elasticsearch/client.go:223	failed to perform any bulk index operations: 429 Too Many Requests: {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [2041914692/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2041909248/1.9gb], new bytes reserved: [5444/5.3kb], usages [request=0/0b, fielddata=180233/176kb, in_flight_requests=5444/5.3kb, accounting=27779116/26.4mb]","bytes_wanted":2041914692,"bytes_limit":2040109465,"durability":"PERMANENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [2041914692/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2041909248/1.9gb], new bytes reserved: [5444/5.3kb], usages [request=0/0b, fielddata=180233/176kb, in_flight_requests=5444/5.3kb, accounting=27779116/26.4mb]","bytes_wanted":2041914692,"bytes_limit":2040109465,"durability":"PERMANENT"},"status":429}

Any suggestion?

Thank you very much

Hi

Any reason why it happens? :frowning:

Thank you

How many beats are indexing into the cluster? How many indices and shards are you actively index into? How much data do you have in the cluster?

If you could provide the full output of the cluster stats API we would get a better idea about the state of the cluster.

This helps?

As I said I have 2 shards and 1 replica for almost all my indices

Version: 7.8.0

Nodes: 3
Disk Available
67.07%
162.3 GB / 242.0 GB

JVM Heap
62.49%
2.9 GB / 4.6 GB

indices: 76
Documents: 54,448,810
Disk Usage: 72.8 GB
Primary Shards: 116
Replica Shards: 116

Thank you

Can you please provide the full output of the cluster stats API?

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "8a3af794e5b7464c9389dd64dee07860",
  "cluster_uuid" : "Ii2dPs_ITa-FL1ZDfTZKMA",
  "timestamp" : 1596184190552,
  "status" : "green",
  "indices" : {
    "count" : 69,
    "shards" : {
      "total" : 204,
      "primaries" : 102,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 4,
          "avg" : 2.9565217391304346
        },
        "primaries" : {
          "min" : 1,
          "max" : 2,
          "avg" : 1.4782608695652173
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 48798886,
      "deleted" : 694768
    },
    "store" : {
      "size_in_bytes" : 69728558111
    },
    "fielddata" : {
      "memory_size_in_bytes" : 249840,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 89281611,
      "total_count" : 19698157,
      "hit_count" : 13508225,
      "miss_count" : 6189932,
      "cache_size" : 2295,
      "cache_count" : 4105,
      "evictions" : 1810
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 2261,
      "memory_in_bytes" : 49403640,
      "terms_memory_in_bytes" : 40103368,
      "stored_fields_memory_in_bytes" : 2331448,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 69504,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 6899320,
      "index_writer_memory_in_bytes" : 135718268,
      "version_map_memory_in_bytes" : 1679732,
      "fixed_bit_set_memory_in_bytes" : 11188400,
      "max_unsafe_auto_id_timestamp" : 1596182513602,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 1122,
          "index_count" : 33
        },
        {
          "name" : "binary",
          "count" : 11,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 3385,
          "index_count" : 54
        },
        {
          "name" : "date",
          "count" : 3314,
          "index_count" : 66
        },
        {
          "name" : "double",
          "count" : 843,
          "index_count" : 36
        },
        {
          "name" : "flattened",
          "count" : 2,
          "index_count" : 2
        },
        {
          "name" : "float",
          "count" : 896,
          "index_count" : 38
        },
        {
          "name" : "geo_point",
          "count" : 264,
          "index_count" : 33
        },
        {
          "name" : "geo_shape",
          "count" : 3,
          "index_count" : 3
        },
        {
          "name" : "half_float",
          "count" : 24,
          "index_count" : 6
        },
        {
          "name" : "integer",
          "count" : 145,
          "index_count" : 13
        },
        {
          "name" : "ip",
          "count" : 3399,
          "index_count" : 33
        },
        {
          "name" : "keyword",
          "count" : 77709,
          "index_count" : 68
        },
        {
          "name" : "long",
          "count" : 29587,
          "index_count" : 57
        },
        {
          "name" : "nested",
          "count" : 78,
          "index_count" : 46
        },
        {
          "name" : "object",
          "count" : 20845,
          "index_count" : 67
        },
        {
          "name" : "scaled_float",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "short",
          "count" : 3334,
          "index_count" : 34
        },
        {
          "name" : "text",
          "count" : 3493,
          "index_count" : 60
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "coordinating_only" : 0,
      "data" : 2,
      "ingest" : 2,
      "master" : 3,
      "ml" : 0,
      "remote_cluster_client" : 3,
      "transform" : 2,
      "voting_only" : 1
    },
    "versions" : [
      "7.8.0"
    ],
    "os" : {
      "available_processors" : 54,
      "allocated_processors" : 6,
      "names" : [
        {
          "name" : "Linux",
          "count" : 3
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 3
        }
      ],
      "mem" : {
        "total_in_bytes" : 9663676416,
        "free_in_bytes" : 3768320,
        "used_in_bytes" : 9659908096,
        "free_percent" : 0,
        "used_percent" : 100
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 6
      },
      "open_file_descriptors" : {
        "min" : 375,
        "max" : 1558,
        "avg" : 1140
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 2163566242,
      "versions" : [
        {
          "version" : "14.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "14.0.1+7",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 3
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 2774695512,
        "heap_max_in_bytes" : 4938792960
      },
      "threads" : 188
    },
    "fs" : {
      "total_in_bytes" : 259845521408,
      "free_in_bytes" : 185039581184,
      "available_in_bytes" : 185039581184
    },
    "plugins" : [
      {
        "name" : "repository-s3",
        "version" : "7.8.0",
        "elasticsearch_version" : "7.8.0",
        "java_version" : "1.8",
        "description" : "The S3 repository plugin adds S3 repositories",
        "classname" : "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      },
      {
        "name" : "repository-gcs",
        "version" : "7.8.0",
        "elasticsearch_version" : "7.8.0",
        "java_version" : "1.8",
        "description" : "The GCS repository plugin adds Google Cloud Storage support for repositories.",
        "classname" : "org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 3
      },
      "http_types" : {
        "security4" : 3
      }
    },
    "discovery_types" : {
      "zen" : 3
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "docker",
        "count" : 3
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 16,
      "processor_stats" : {
        "append" : {
          "count" : 15662626,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 2252
        },
        "conditional" : {
          "count" : 31630406,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 21143
        },
        "date" : {
          "count" : 7913266,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 10522
        },
        "geoip" : {
          "count" : 15826532,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 35533
        },
        "grok" : {
          "count" : 23750528,
          "failed" : 51795,
          "current" : 0,
          "time_in_millis" : 47344
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "pipeline" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 23745827,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 3818
        },
        "rename" : {
          "count" : 23745827,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 4370
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 7831313,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 812
        },
        "split" : {
          "count" : 15826532,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 6064
        },
        "user_agent" : {
          "count" : 7913266,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 2732
        }
      }
    }
  }
}

That seems like a lot of shards given the data size and the size of the cluster. You are also using ingest pipelines so am not sure how much memory that will use. I would start by reducing the number of indices and shards. Set the number of primary shards to 1 for all indices and looking using ILM to get a larger average shard size if you are not already.

If I understood correctly, the problem is that I have gone from having 1 shard / 1 replica to having 2 shards / 1 replica, and this is not good in my case because the size of the shards is very small?

About "using ILM to get a larger average shard size if you are not already" I have a ILM to rotate every day the indexes but I have indexes that only have less than 2gb

Correct. You could also switch from daily to e.g. monthly indices.

And this errors parsing the fields, they are related with not sending the logs?