Issues with mapping conflicts at scale

We ingest data with Elastic’s integrations. What we have found over the years is that Elastic will release an integration and make changes to the field mappings over time. This is understandable, but we eventually run into issues with mapping conflicts for large data sets like CrowdStrike FDR with over 3000 fields.

For example, in an older version of the integration, the Fleet-managed mappings were indexing crowdstrike.Success as a Keyword when it contained Boolean values. The integration development team identified this and corrected it, which is good, but then we have a mapping conflict. For example, now it is type Boolean but half our indices have it as Keyword. We can’t use this field inside of data views or any other internal tools. It is basically busted. Over time, this issue occurs over many fields in our datasets.

As far as I know, Elastic’s only solution is to reindex the data. We keep over a year’s worth of data and this solution is not quick when dealing with TB of data. A lot of time would be spent fixing these problems and reindexing. It is also not easy for our cluster to handle.

Is there a better solution to addressing mapping conflicts?

I understand we could take ownership and create the mappings ourselves, but that lowers the value of the integrations due to the time it would take to develop the index templates. It also does not address the issue that fixing mapping conflicts is difficult.

I feel like there should be a more integrated solution within Kibana and Elasticsearch that allows you to handle mapping conflicts easily. It would make sense to me if there was a button on the mapping conflicts page that we could click to “remap” the field across the affected indices so that the mapping aligns with the current type. The action would only target the affected field - lightweight and fast.

This would work well where the mapping changes from type Keyword to Boolean on a true or false value.

There could also be situations where the value shape has changed entirely. For example, crowdstrike.Success could change from a 1 to a true and the remap process would require a painless script or pipeline to process the field before remapping.

If Elastic would improve the mapping conflict resolution capabilities within Kibana, it could potentially reduce the effort and load required to handle these conflicts. I think it would be a big win for the product and customer base who struggles with these issues.

There are basically 2 solutions for this, both of them may have problems at scale.

One is reindexing the data, this can be very problematic because you may have terabytes of data, some may be on frozen snapshots or normal snapshots, restoring it to reindex could take a lot of time and also be very expensive, this does not work at scale.

The other would be using runtime fields to change the mapping on query time, but this will impact search performance, sometimes the impact can be too big that you may need to revert back the runtime field.

Another problem is if you use Security Rules, if you have a field with a mapping conflict in Kibana, any security rule that uses that field may break and stop working.

The only way to change the data type of a field in an index is with reindex, the alternative would be the mentioned runtime field with changes the data type at query time.

This remap could create a runtime field with the desired mapping, so the user would not have to do it manually if they want.

I had multiple problems with mapping conflicts in integrations last year and it led to this discussion on Github on how to make more clear when a change on the integration could lead to a mapping conflict: [Discussion] - Breaking change resulting from data type changes of mapped attributes · Issue #13861 · elastic/integrations · GitHub

What I'm doing on my side is to check the changelog of all integrations before updating and check what is being changed, if a mapping conflict is possible and would affect me or not, depending on the field I could use the custom template of the integration to keep the mapping in the way I want.

1 Like

Thank you for your feedback. It is clear this is an issue affecting a large number of customers and Elastic does not have a scalable long-term solution.

I hope Elastic will focus more effort into developing a more integrated solution within the Kibana UI.

I am thinking about using an integration’s data stream namespace to version control our data streams. I wonder what you think about what.

The big problem now is that all of our CrowdStrike logs are in the default namespace (e.g., logs-crowdstrike.fdr-default), so when you go through 2-4 integration upgrades over a year, there are mapping conflicts that result from aggregating these different versions into one single data stream.

If we are to apply the same integration major version - assuming mapping breaking changes are limited to each major version change by the integration development team, then upon upgrading an integration’s major version, we can change the integration’s namespace to match the new major version. We would then have data streams and backing indices that match the following patterns:

logs-crowdstrike.fdr-v2

logs-crowdstrike.fdr-v3

and so on….

Each data stream namespace would contain its own set of unique mappings and avoid conflicts with other namespace versions.

We could then use an index alias or the index pattern to map the “production” version (i.e., the most recent major version) as the primary data view for our primary queries and detection rules.

I am also thinking that we could have aliases and data views that match all namespace versions to enable full search across the indices when required. The mapping conflicts would come back, but assuming these are on search terms that are not used much, it should not be too impactful.

This would not solve the root cause, but the isolation may make the errors less noticeable and impactful while searching current indices.