Mapping Explosion for Kubernetes/Openshift Index

Hi folks,
I am shipping application logs using filebeat daemonset (7.9.0) from openshift to elasticsearch (7.6.2)

Right now I am running into following warnings:

"Limit of total fields [10000] in index [filebeat-openshift-2021.04.27-000011] has been exceeded"

I understood that I am running into mapping explosion due to a lot of new fields added to the index. As increasing of the limit of fields does not make any sense I would like to ask where should I start to resolve this problem?

I am expecting having a lot of fields due to the dynamic of an openshift cluster. At the same time I cannot provide strict mapping, since this does not make any sense to me, because I have no clue what the logs look like from the deployments.

Thanks for any hint

Are these all the same application, or multiple different ones?

If it's the latter, then putting each application into its own index would make more sense, as you are keeping similar data structures together.

Yes, these are a lot of completely different, independent applications.

Looks like the only solution is really to separate the applications to different indices, which brings quite a lot of overhead, which I by the way tried to address in my post to collect some experiences from other users, but somehow it looks like this is not a common issue: How you deal with ILM for kubernetes cluster applications?

I see following drawbacks:

  1. I need to create index templates and indices in advance before someone deploys an application, because I want to use ILM.
  2. My logstash pipeline will become pretty huge due a lot of "if else" statements to seperate application logs into different indices. This could become a problem like slower processing times and larger memory consumption, right? Also when something is wrong with the logs from one application the whole pipeline will fail and the shipping of all application logs will be impaired.
  3. Filebeat hint based autodiscover is sort of absolete in this case ain't it? It should provide to the developer the flexibility to control the configurartion of his logs without having to ask me to change the logstash configs. But due to the index separation I will become a blocker, since I am needed to apply the changes.

However. Can you maybe help me to get these things clear?

  1. Ideally, yes. If you split them out though you might find the dynamic mapping won't blow out per app/index
  2. Split things out and use different Logstash pipelines, or use ingest pipeline in Elasticsearch? You can define ingest pipelines directly in Filebeat as well
  3. Why not? The idea of that is you can define abstractions that are then "looked up" for said autodiscovery

Ultimately you need to pay the cost of this somewhere. If it's dealing with mapping explosions directly in Elasticsearch, or pipelines in Logstash, or Filebeat config. It's not a point and shoot thing due to the complexities with multiple, unique apps

1 Like

Thanks.

  1. I am using Logstash. Since logstash can have only one pipeline "listening" to a given port I cannot split it up to different pipelines, because my filebeat daemonset has only one logstash output. I guess I could create multiple outputs in filebeat to send different logs to different logstash pipelines, right?

BTW: I found what caused the mapping explosion: I have annotated my filebeat pods with json annotations. This lead to insane amount of new fields coming from the filebeat logs like some container ids and harvester properties etc. After I disabled it and rolled over the index it looks like i do not run into the limits anymore. I will keep an eye on it.

You could use Pipeline-to-Pipeline Communication | Logstash Reference [7.12] | Elastic.

Nice work!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.