Let's say you have a cluster with application logs. And since multiple applications are sending data to the cluster they have different properties. For example, one application will send a log message in the "message" property while the other would send it in the "msg" property.
How should one configure the mapping to preserve those fields but also have the possibility to use a single "message" property for all applications?
I know there is a "copy_to" feature but is that the only option there is? Are there some other options?
that's an interesting question. Here's what my checklist would be:
Try to align the field names to ECS before ingesting (e.g. via a beats processors, logstash transforms or an ingest pipelines). I know it's not always possible, but I wanted to make sure it's mentioned again.
If that's not possible, make sure to index into different indices with specific mappings for each source. A common prefix like logs-app1.stdout and logs-app2.debug still allow for querying across all of them using logs-*. But it has the advantage of keeping the individual mappings small and focused.
In the individual mappings alias fields can be used to align the field names without duplicating the data.
If a simple field alias doesn't work runtime fields allow for the creation of fields that are evaluated at query time.
I'd like to avoid 1st solution since I wouldn't like to mutate fields, just to add additional ones.
I like 2nd and 3rd solutions since they combined give me exactly what I need. However, since we currently have all app logs in the same index it's not trivial to do that at the moment. Do you think something similar might be possible with copy_to feature? We could take 2 fields and "merge" them in a single field, since our logs have only 1 field filled, the "copy_to" field would be either one of those which is not empty
I'd like to avoid 4th solution if possible due to performance reasons
yes, that should work as well as far as querying is concerned. Not sure whether the display in the Logs UI might be negatively affected, TBH. One thing to keep in mind is that compared to aliases or field renaming it uses up more storage space due to the duplication.
In terms of architecture cleanliness and maintainability I'd recommend to perform that copying in an earlier step during the ingestion (such as in logstash or an ingest pipeline). Or are the apps writing to Elasticsearch directly?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.