Mapping Explosion for Kubernetes/Openshift Index

Kosodrom · April 27, 2021, 4:59pm

Hi folks,
I am shipping application logs using filebeat daemonset (7.9.0) from openshift to elasticsearch (7.6.2)

Right now I am running into following warnings:

"Limit of total fields [10000] in index [filebeat-openshift-2021.04.27-000011] has been exceeded"

I understood that I am running into mapping explosion due to a lot of new fields added to the index. As increasing of the limit of fields does not make any sense I would like to ask where should I start to resolve this problem?

I am expecting having a lot of fields due to the dynamic of an openshift cluster. At the same time I cannot provide strict mapping, since this does not make any sense to me, because I have no clue what the logs look like from the deployments.

Thanks for any hint

warkolm · April 28, 2021, 1:28am

Are these all the same application, or multiple different ones?

If it's the latter, then putting each application into its own index would make more sense, as you are keeping similar data structures together.

Kosodrom · April 28, 2021, 8:19am

Yes, these are a lot of completely different, independent applications.

Looks like the only solution is really to separate the applications to different indices, which brings quite a lot of overhead, which I by the way tried to address in my post to collect some experiences from other users, but somehow it looks like this is not a common issue: How you deal with ILM for kubernetes cluster applications?

I see following drawbacks:

I need to create index templates and indices in advance before someone deploys an application, because I want to use ILM.
My logstash pipeline will become pretty huge due a lot of "if else" statements to seperate application logs into different indices. This could become a problem like slower processing times and larger memory consumption, right? Also when something is wrong with the logs from one application the whole pipeline will fail and the shipping of all application logs will be impaired.
Filebeat hint based autodiscover is sort of absolete in this case ain't it? It should provide to the developer the flexibility to control the configurartion of his logs without having to ask me to change the logstash configs. But due to the index separation I will become a blocker, since I am needed to apply the changes.

However. Can you maybe help me to get these things clear?

warkolm · April 28, 2021, 11:49pm

Ideally, yes. If you split them out though you might find the dynamic mapping won't blow out per app/index
Split things out and use different Logstash pipelines, or use ingest pipeline in Elasticsearch? You can define ingest pipelines directly in Filebeat as well
Why not? The idea of that is you can define abstractions that are then "looked up" for said autodiscovery

Ultimately you need to pay the cost of this somewhere. If it's dealing with mapping explosions directly in Elasticsearch, or pipelines in Logstash, or Filebeat config. It's not a point and shoot thing due to the complexities with multiple, unique apps

Kosodrom · April 29, 2021, 8:45am

Thanks.

I am using Logstash. Since logstash can have only one pipeline "listening" to a given port I cannot split it up to different pipelines, because my filebeat daemonset has only one logstash output. I guess I could create multiple outputs in filebeat to send different logs to different logstash pipelines, right?

BTW: I found what caused the mapping explosion: I have annotated my filebeat pods with json annotations. This lead to insane amount of new fields coming from the filebeat logs like some container ids and harvester properties etc. After I disabled it and rolled over the index it looks like i do not run into the limits anymore. I will keep an eye on it.

warkolm · April 29, 2021, 8:54am

You could use Pipeline-to-pipeline communication | Logstash Reference [8.11] | Elastic.

Nice work!

system · May 27, 2021, 8:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mapping explosion vs. the filebeat template Elasticsearch	1	278	February 28, 2022
ES 2.3 -> 5.x metricbeat index field limit Beats metricbeat	12	3189	January 3, 2017
Growing field mapping management Elasticsearch	1	267	September 14, 2020
Limit Of Total Fields in index has been exceeded Elasticsearch	4	1977	August 27, 2018
Help with reducing mapping Kibana	10	1050	February 28, 2021

Mapping Explosion for Kubernetes/Openshift Index

Related topics