With this template in place, any field that doesn't match the regex will be mapped as object and will be set dynamic and enabled to false.
This way i disabled the indexing of many unuseful fields like single-character fields.
The problem is those unindexed fields are still get presented in the index mapping, which means they count for the limitation of index.mapping.total_fields.limit which i'd like to avoid.
As far as i understand, the way to avoid new fields to be added dynamically to the index mapping is to set "dynamic: false" at the top level, but then how can i enable the fields that are passing the convention?
So basically the goal is to just "hide" any field that not passes my convention, without removing it of course.
There is no way to control whether or not a field will be added to the index mapping?
setting 'dynamic: false' at the top is not an option for me as i still need to be able to add new fields dynamically.
I don't think there is an option in logstash to control whether or not a specific field will be indexed or will be added to the index mapping.
I think this feature can be useful, there are some cases when as an admin you don't have a full control over the data flow into Elasticsearch, especially with the logging use cases.
The intention here is to protect the system against abuse and mapping explosion which i faced recently after one specific app (it was hard to track it down), sent thousands of unique fields in each document. In this specific case the Elasticsearch just crashed.
If i could make a whitelisted fields based on some regex expression it would solve that problem.
There's a big catch-22 here. Without a mapping Elasticsearch has no idea how to handle the fields, during process, storage, and return to a request. And what about when a user makes a request and then gets all this data that they aren't expecting, they might think there's a problem with the code, or Elasticsearch, and if a cluster admin saw it they might think there's corruption, or worse.
If you are saying just store everything as an unmapped keyword, then why not define it and set it to not be indexed? If not a keyword, then what sort of field?
I understand your point here, but the best way to prevent mapping explosion is to not accept the fields to begin with. There's always limitations with the use of software unfortunately
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.