We are looking into an issue where we continue to hit field limits. We have been bumping them but we know this is not a permanent solution and we need to find a long term solution.
We are using daily indexes that we are storing logs in and we use an index template that has dynamic enabled. I'm going through the mappings of our index and I'm seeing fields that have not been used in the past 2 days. I was under the impression that the daily index starts out with no fields and it adds them dynamically as the logs come in but this doesn't seem to be the case, are the mappings carried over from the previous day when a new daily index is created?
mapping is just the "design schema" of the coder.
ES will add fields not in the mapping if the payload contains fields not defined in your mapping (when dynamic mapping is enabled).
If you are getting more than 1000 fields in your new index (the error log), that means your payload's unique fields + the ones in your mapping add up to be greater than 1000.
This doesn't totally answer my question. At least not as far as I can tell. Do the mappings start out empty every day when it rotates the index and it adds them as needed? The reason I'm asking is because I'm seeing fields in my mappings that are not being used, as far as I can tell. This makes me think that maybe at some point they did get used, they got added to the mapping and the mapping is getting carried over each day when it creates the new daily index?
If all fields are dynamically mapped that would be the case. Note that there may be multiple index templates applied, so might there be some that you have overlooked?
I would run a query against the specific index and search for documents where the field you believe should not exist actually exists.
I was actually just able to find these fields. I had been trying to use grafana to search but kibana turned out to be more helpful in this situation. It looks like most of these fields are coming from nginx request header information. I'm not exactly sure what the best way to handle this is? We need this information and want to be able to search it but these fields can easily get out of control because we don't control what can be put in the request headers.
The best practice IMO is to index only the fields that need to be searched.
Store the entire header as a text field and disable indexing on that (body) field so you can still get the entire header.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.