Optimizing index template for logs from Kubernetes workloads

Hi all, Im having the following setup for receiving logs to my Kibana from GKE hosted workloads:

Fluentbit > Elasticsearch > Kibana

My Elasticsearch and Kibana are running on one single server so I have a single node setup. I have a requirement where I want to look into possible methods of reducing storage consumption by the indices as my daily indices have started to increase to around 5GB in size. I already have a life cycle policy to retain logs only for the necessary time period but even with this in place it still holds a considerable amount.

I have read online that if I'm running Elasticsearch as single node it is recommended to set number_of_replicas to 0 instead of 1 to prevent duplication. Mine was set to the default 1 and I have configured it to 0. I was researching what else could I do to the index to optimize logging for microservices to reduce any unnecessary storage consumption while also maintaining a decent query speed. I haven't had any luck figuring things out based on already available resources. I would appreciate any support given to optimize my index.

My current index is as follows:

{
  "order": 0,
  "version": 60001,
  "index_patterns": [
    "logstash-*"
  ],
  "settings": {
    "index": {
      "number_of_shards": "1",
      "refresh_interval": "5s",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "dynamic_templates": [
      {
        "message_field": {
          "path_match": "message",
          "mapping": {
            "norms": false,
            "type": "text"
          },
          "match_mapping_type": "string"
        }
      },
      {
        "string_fields": {
          "mapping": {
            "norms": false,
            "type": "text",
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            }
          },
          "match_mapping_type": "string",
          "match": "*"
        }
      }
    ],
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "geoip": {
        "dynamic": true,
        "type": "object",
        "properties": {
          "ip": {
            "type": "ip"
          },
          "latitude": {
            "type": "half_float"
          },
          "location": {
            "type": "geo_point"
          },
          "longitude": {
            "type": "half_float"
          }
        }
      },
      "@version": {
        "type": "keyword"
      }
    }
  },
  "aliases": {}
}

ChatGPT provided me an optimized index as follows, is it a good replacement for my existing one:

{
  "index_patterns": [
    "logstash-*"
  ],
  "template": {
    "settings": {
      "index": {
        "number_of_replicas": 0,
        "lifecycle": {
          "name": "metrics"
        },
        "codec": "best_compression",
        "query": {
          "default_field": [
            "message"
          ]
        }
      }
    },
    "mappings": {
      "dynamic_templates": [
        {
          "match_ip": {
            "match": "ip",
            "match_mapping_type": "string",
            "mapping": {
              "type": "ip"
            }
          }
        },
        {
          "match_message": {
            "match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "match_only_text"
            }
          }
        },
        {
          "strings_as_keyword": {
            "match_mapping_type": "string",
            "mapping": {
              "ignore_above": 1024,
              "type": "keyword"
            }
          }
        }
      ],
      "date_detection": false,
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "data_stream": {
          "properties": {
            "dataset": {
              "type": "constant_keyword"
            },
            "namespace": {
              "type": "constant_keyword"
            },
            "type": {
              "type": "constant_keyword",
              "value": "metrics"
            }
          }
        },
        "ecs": {
          "properties": {
            "version": {
              "type": "keyword",
              "ignore_above": 1024
            }
          }
        },
        "host": {
          "type": "object"
        }
      }
    },
    "aliases": {}
  }
}

Adding best compression is a good way to reduce storage although it tends to add some load at indexing time. This is what I would have initially suggested, as it carries little risk.

The other changes to field mappings may reduce storage but could also affect dashboards and how you query the data, so is riskier. This is something I would initially avoid and only do after thorough testing in a separate environment.

Elasticsearch will never allocate a replica to the same node that holds the primary so changing this will not reduce storage given that you have a single node. It will however change your indices to green status as the unallocated replicas will make then yellow. It is still a change worth making.

1. What you already understood correctly

:white_check_mark: number_of_replicas: 0 – on a single node, replicas are indeed useless.

:white_check_mark: number_of_shards: 1 – excellent for your use case (5 GB/day). Avoids unnecessary fragmentation.

:white_check_mark: refresh_interval: 5s – reasonable for near-real-time usage.

:warning: However, the ChatGPT mapping is dangerous for your use case.
It converts all string fields to keyword (no full-text search) and adds match_only_text on message, which will break many Kibana visualizations.
Don't use it as-is without thorough testing.


2. 4 safe and immediate optimizations

:white_check_mark: A. Enable best_compression (low risk, high gain)

Christian Dahlqvist gave you excellent advice.
Add this to your template for new indexes:

json

"settings": {
  "index.codec": "best_compression"
}

Typical gain: 20–30% less storage on text-heavy logs.

:white_check_mark: B. Reduce _source (if you don't need reindexing or scripts)

This is the most underestimated optimization on a single node.

By default, Elasticsearch stores the original JSON in _source.
For large logs, you can disable compression or store only specific fields.

Example (template):

json

"mappings": {
  "_source": {
    "includes": ["@timestamp", "message", "host", "kubernetes.*"]
  }
}

Or more aggressive (if you don't use scripts or reindexing):

json

"_source": { "enabled": false }

:warning: Warning: no _source means no update/reindex, but reading still works via doc_values.

Gain: 30–50% space (huge for GKE logs).

:white_check_mark: C. Use synthetic _source (ES 8.4+)

If you're on a recent Elasticsearch version:

json

"mappings": {
  "_source": { "mode": "synthetic" }
}

This reconstructs _source on the fly from doc_values/stored fields.
Less storage, still compatible with Kibana.

:white_check_mark: D. Remove norms and doc_values on unused fields

Your current template already has "norms": false – very good.
Also add "doc_values": false for fields you don't aggregate on.

Example for a kubernetes.labels field not used in term aggregations:

json

"kubernetes.labels": {
  "type": "text",
  "doc_values": false,
  "norms": false
}

Gain: modest (5–10%) with zero functional impact if you don't aggregate on that field.


3. Optimizations at the FluentBit level (before Elasticsearch)

These are often ignored in forums.

:white_check_mark: E. Filter unnecessary fields upstream

In your fluent-bit.conf:

ini

[FILTER]
    Name         modify
    Match        *
    Remove       kubernetes.annotations.*
    Remove       kubernetes.labels.*
    Remove       log.offset
    Remove       stream

Typically, Kubernetes annotations double log size for no reason.

Gain: 15–40% depending on your workload.

:white_check_mark: F. Enable HTTP compression in FluentBit

Between FluentBit and Elasticsearch:

ini

[OUTPUT]
    Name         es
    Match        *
    Host         your-es
    Port         9200
    Compress     gzip

This also saves network bandwidth, but more importantly reduces CPU load on ES.