We are using Fluentbit to ingest Kubernetes cluster logs into Elasticsearch.
Not sure what I am setting wrong, but every day, as the index grows, ingestion is worsening from 550k op/s to 40k op/s...
Ingestion from fluentbit agent is constant but at some point ES will slow down, starting to queue more and more, using less cpu and rejecting new write
Fluentbit 1.8.9
Kube Cluster: 403 nodes
Elastic: 30 datanodes 8C 16G
Our main usage in ingestion, not so much querying, so for the index template I set:
"refresh_interval": "60s",
"number_of_shards": "30"
"translog": { "durability": "async" },
"number_of_replicas": "0"
Today we will try to also add:
indices.memory.index_buffer_size: 30%
The index is rolled up daily (fluentbit use the date in the index name it write to), and we move the index to warm phase after 2 days.
FluentBit config:
Flush 1
Daemon Off
Log_Level info
Parsers_File parsers.conf
HTTP_Server On
Name tail
Path /var/log/containers/*.log
Tag kube.*
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 512MB
Skip_Long_Lines On
Docker_Mode On
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key json_message
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Annotations Off
Name es
Match kube.*
Generate_ID On
Logstash_Format On
Replace_Dots On
Logstash_Prefix kube-containers-${CLUSTER_NAME}
Retry_Limit 10
Suppress_Type_Name On
Trace_Error On
Buffer_Size False
Any idea ? If I set to create more shards could it bring any benefits ? (should ease queries but just add overheard for ingestion right ?)
Thanks !!