I am trying to find a way to only index fields that pass some naming validation using regex, so i did this:
With this template in place, any field that doesn't match the regex will be mapped as object and will be set dynamic and enabled to false.
This way i disabled the indexing of many unuseful fields like single-character fields.
The problem is those unindexed fields are still get presented in the index mapping, which means they count for the limitation of index.mapping.total_fields.limit which i'd like to avoid.
As far as i understand, the way to avoid new fields to be added dynamically to the index mapping is to set "dynamic: false" at the top level, but then how can i enable the fields that are passing the convention?
So basically the goal is to just "hide" any field that not passes my convention, without removing it of course.
Any ideas? I'm using Logstash to ingest data btw.
This is a bit of a problem with mappings at the moment, but I am not sure if there's an easy solution on the Elasticsearch side.
If you don't want the single char fields in Elasticsearch, you may want to move the validation into Logstash somehow.
There is no way to control whether or not a field will be added to the index mapping?
setting 'dynamic: false' at the top is not an option for me as i still need to be able to add new fields dynamically.
I don't think there is an option in logstash to control whether or not a specific field will be indexed or will be added to the index mapping.
No there is not.
You can do a conditional and then a mutate+remove for the field. It'll stop it going to Elasticsearch.
I don't want to remove the field though, it might contain relevant data, i just don't need it to be indexed or get presented under the index mapping.
I think this feature can be useful, there are some cases when as an admin you don't have a full control over the data flow into Elasticsearch, especially with the logging use cases.
The intention here is to protect the system against abuse and mapping explosion which i faced recently after one specific app (it was hard to track it down), sent thousands of unique fields in each document. In this specific case the Elasticsearch just crashed.
If i could make a whitelisted fields based on some regex expression it would solve that problem.
There's a big catch-22 here. Without a mapping Elasticsearch has no idea how to handle the fields, during process, storage, and return to a request. And what about when a user makes a request and then gets all this data that they aren't expecting, they might think there's a problem with the code, or Elasticsearch, and if a cluster admin saw it they might think there's corruption, or worse.
If you are saying just store everything as an unmapped keyword, then why not define it and set it to not be indexed? If not a keyword, then what sort of field?
I understand your point here, but the best way to prevent mapping explosion is to not accept the fields to begin with. There's always limitations with the use of software unfortunately
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.