Clean up field mappings


(Morten Slaatten Berg) #1

Hi.

We have an ELK-stack for our centralized logging, and we keep logs for about 3 weeks. We use dynamic mapping and over time our field list has grown quite big. Currently the index has over 2000 fields defined.

At one point we experienced that log messages were dropped, and figured out that it was because we hit the field limit in elastic search. So we increased that in order to accept our new log messages. But now we experience that our visualizations that use term aggregation don't work any more (we get "no results found"), and we wonder if this can have anything with the field limit-adjustment?

I think that many of the fields are not in use any more, since we have shut down some of the application logging. Is there a way to clean up / reduce the number of fields defined? Is it easy to figure out which fields are not in use any more?

Regards,
Morten


(Shane Connelly) #2

You can't remove a field from a mapping, but I'd like to address the sources of the problem. One of the issues that it sounds like you're putting all your data into a single index. You should consider using date-based indices instead for a few reasons:

  1. It avoids this "too many fields" situation
  2. It's much more efficient to do time-based retention

That is, instead of indexing into a single index called logs (or whatever it's called), index into something like logs-2017.04.21 for today's data. Then your index will only have whatever fields were created today. Every day, delete all indices with dates older than a few weeks or whatever your retention period is. Deleting a whole index is much more efficient than trying to run a delete-by-query, which you may be doing instead.

If you're using Logstash, it will do this date-based-index-name thing by default, but you can also set your own pattern (see https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-index). Beats does this too by default. You can read more about time-based indexing at https://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html.

If you want, you can do the cleanup (delete of old indices) automatically with curator: https://www.elastic.co/guide/en/elasticsearch/client/curator/5.0/ex_delete_indices.html


(Morten Slaatten Berg) #3

Thanks for the quick reply!

We are using time-based indexing, one for each day (for example logstash-2017.04.21). So this means that the >2000 fields I see under Kibana->Management->Index Patterns are fields currently in use in at least one of our 21 indices (3 weeks)? That is, when a new index is created for a new day, it only maps the fields of the documents being stored? It doesn't copy the field mappings from the previous days?

If that is the case, it might turn out that we are logging too many fields in the first place? (We have about 100-150 applications logging to our ELK-cluster)

What we are experiencing, that our terms aggregations visualizations don't provide any results - is it possible that this is because of the field limits, or am I going down the wrong hole? Any ideas where I can look in order to figure out whats happening?

Regards,
Morten


(Shane Connelly) #4

We are using time-based indexing, one for each day (for example logstash-2017.04.21)

That's great news!

However, if that's true and the following statement is true:

At one point we experienced that log messages were dropped, and figured out that it was because we hit the field limit in elastic search.

then that means you somehow managed to create 1,000 fields in a single day, as the index.mapping.total_fields.limit setting is on a per-index basis (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings), not a per-index-pattern basis.

That is, when a new index is created for a new day, it only maps the fields of the documents being stored? It doesn't copy the field mappings from the previous days?

When a new index is created for a new day, it starts out empty excepting anything in your index templates (https://www.elastic.co/guide/en/elasticsearch/reference/5.3/indices-templates.html). It is possible something has added various templates that are matching and the final mapping is the merge of them. You may want to have a look at the templates you have (GET /_template). You may want to have a look at your most recent index's mapping/settings to investigate a bit (GET /logstash-2017.04.21 or whatever) and post the results to a gist/here if you're still stuck. You may also want to refresh the field list in Kibana to see how many fields you're actually dealing with.


(Morten Slaatten Berg) #5

I finally figured out what was happening. For a while ago, we upgarded form Logstash 2.x to Logstash 5.x. One of the changes in these versions is that in LS 5.x fields are mapped to .keyword as opposed to .raw in LS 2.x. But this is backwards-compatible, meaning that if the logstash template already exists, it will continue to map to .raw. (https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html)

When we had to make adjustments to the template, increasing the field limit, we must have deleted the original template somehow and thus Logstash created a new template which maps to .keyword.

So changing out visualisations in Kibana to use .keyword instead of .raw fixed out problem.

Thanks for the pointers, helped me figure out what was really going on!


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.