I'm using Elastic Stack for Application Performance Management (APM).
Many applications will send metrics and transactions to a centralised solution of APM.
I have some doubts about custom "labels", which is a dynamic field in the mapping.
These apps don't have so much traffic, so I've created only one index of transactions for all apps: together will index about 1TB per day.
However, these apps could index a lot of different "labels" and I fear a mapping explosion. 1) How should I manage the possible case of mapping explosion?
Also, some apps could index labels in a different way than others, I mean: same name, different type.
In this case, ES will have a lot of parse errors. I've tested how ES performs with that errors, and I saw that ES performs about 40% less than with correct events.
2) How should I manage this? I would like to index the rest of the document without that invalid labels, is it possible?
I've tested the ignore_malformed parameter but it not works with booleans for example.
3) How could I ensure that ES does not degrade so much its performance in case of parsing errors?
First, on the mapping explosion: since you have domain knowledge of the services/apps setting labels, the best way forward is to avoid it in the first place: just not create too many labels, or don't use variables with high cardinality for labels.
If possible, make it so that the apps use the same types for the same labels, or have them create labels with different names so they can not clash (for instance, prefixing the label with the app name).
ignore_malformed should work. But keep in mind that it is a template setting, and it won't be applied to existing indices, could be that the reason why it looks it is not working for you? Can you check on newly created indices (after the template update)?
Other alternative would be to have different indices for different services, and then configure ILM policies so that you don't end up with very small daily indices.
Hope this helps, let me know if otherwise it doesn't!
Actually, it's very difficult to ensure that all teams use the same types for the same labels.
I need some strategy on the APM cluster side to ensure that ES does not degrade so much its performance in case of parsing errors.
Is there something I can configure to avoid ES degrade when parsing errors occur?
This could work.
Could I merge multiple indices (one per service) to have only one per day?
What happens if two indices have fields with equal name and different type?
You are right that ignore_malformed only work with some types. Mapping errors degrade performance because throwing exceptions are expensive operations in Java, that has nothing to do with APM or APM Server. So the best thing to do is to avoid them in first place.
it's very difficult to ensure that all teams use the same types for the same labels
Another option then is to force them to use different labels, eg. by prefixing labels names with the service name.
Could I merge multiple indices (one per service) to have only one per day?
What happens if two indices have fields with equal name and different type?
There is no API or tool out of the box to merge indices, you would need to create an script to merge indices yourself.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.