How can I prevent Cloud APM server from flattening my OpenTelemetry tag hierarchy?

We have recently made a deployment to Elastic Cloud with APM (currently running v8.9.1). We are sending OpenTelemetry traces directly from dotnet applications to the APM endpoint. This is working fine, but we are having trouble with the naming of tags.

We trying to align our custom, business related, tags with the ECS Guidelines by using a nested fields set structure (eg. companyname.app.custom_property) to avoid conflicts. However, once the tags get into Elastic, all tags are placed in labels. and dots are replaced with underscore (eg. companyname_app_custom_property). I'm 99% sure this happens in APM, not in our applications. This is unfortunate as it removes a lot of context for tags. (structured vs list from hell).

I have not been able to find exact documentation as to the internal mapping of APM when receiving native OpenTelemetry. Looking at the index templates and ingestion pipelines (all default with the cloud deployment) I cannot find anything related to this transformation.

How can I prevent this flattening of tags? Is there some documentation where customization of the APM OpenTelemetry ingestion is detailed? Or is it even possible to customize this part on an Elastic Cloud deployment?

Hi @Annxii,

It is correct that we do replace the dots with underscore due to the reason discussed in this ticket. We're working on a long-term solution where fields will be flattened/non-hierarchical, and can have dots in them.

What I can suggest right now to work around this is to use a custom ingest pipeline. The tags of Otel are actually attributes and we store the attributes under labels. You can use a custom ingest pipeline to rename the labels.* fields to whatever you want, even as a structured format. Here's the guideline for setting up the pipeline for APM: Parse data using ingest pipelines | APM User Guide [8.10] | Elastic

Also, we don't have definitive documentation about how we map the data, but the core logic lives in apm-data if you'd like to take a look at how we convert Otel trace data into our model.

Regards,
Kyungeun

Hi @kyungeunni

Thank you very much for the reply.

It is very helpful looking at the code for apm-data!

It is, however, unfortunate if the long-term solution is still flattened attributes (also thank you for the terminology correction). It is not the lack of the dots in it self that is the issue, but the missing hierarchical structure. I understand the mapping of Otel attributes to the internal structure of elastic, but it also means that my custom structure becomes less ideal to work with when it is flattened.

I do think I can make it work reasonably well with the custom ingest pipelines, so thank you for that as well. The choice of underscore as the replacement character does, however, impose some limitation as it conflicts with ECS guidelines as I mentioned. I will not able to distinguish between parts of the flattern hierarchy and multi-word attributes.

Best Regards, Torsten

@Annxii would you mind elaborating a little on why you think using flattened attributes is a problem? Is this a visualisation problem, when looking at the documents in Discover? If that is the case, maybe there's some room for improving the UI without changing the way the data is stored.

In case you didn't get through all the details of the linked issue, the reason we're intending to flatten everything is to avoid mapping conflicts. e.g. you might have an attribute x which is an integer, and another x.y which is a string; if we treat them as hierarchical, then these would conflict (because x is not an object).

So instead of trying to store like {x: y: {...}}, we would store as {x: ..., x.y: ...}. In the UI we could potentially collapse structures into hierarchies where possible.

@axw certainly! First a disclaimer: This is just my personal opinion! I perfectly understand that there is probably many aspects under the hood that I'm not aware of :slight_smile:

My immediate problem is that the name of my attributes defined in my system change when entering Elastic. This requires mental mapping with Kibana usage and technical if we are to query the API directly. I can see that allowing attribute names with dots but keeping the structure flattened would solve some of this, at least for the Kibana usage. In that case, UI improvements would be greatly appreciated as I really like being able to dot through the nested structure in the Kibana.

I understood the potential mapping conflict you are trying to solve. However, I think it seems like a lot of work to hide the design problem the user is having the their defined structure. If you define both x and x.y as a string you have not defined x clearly enough - it has multiple uses. To me, this is not much different than defining x as string in one place and as integer in another place. This would also result in a mapping conflict as far as I understand (and rightfully so).

I ran into a bit of the same problem when trying to get my transaction to be marked as messaging instead of known. I had defined the Otel attribute messaging.destination.name in accordance with the latest Otel semantic conventions (1.21), however digging through apm-data showed it was still using messaging.destination (1.5).

The solution you have currently described is also in direct contrast to the guidelines of your own ECS Guidelines which I referred to previously.

Nest fields inside a field set with dots
The document structure should be nested JSON objects. If you use Beats or Logstash, the nesting of JSON objects is done for you automatically. If you’re ingesting to Elasticsearch using the API, your fields must be nested objects, not strings containing dots.

With the contribution of ECS to OpenTelemetry (which I applaud!), I think this also applies to an OpenTelemetry integration, at least in the long-term.

I assume that non-custom fields are handled hierarchical, so I think it would be better to embrace the hierarchical structure for all data. I think we all face the design issue from time to time of having defined something as primitive but later finding out that a complex type would be better suited. Instead of trying to hide this I think it would be awesome if you could help with changing a primitive type to a complex type.

One way I could see this implemented could be to define a convention for a default value for a complex type. Ex. {x: "qwerty"} could re-indexed to {x: { _value: "qwerty" }}. The query with a primitive type on a complex attribute could transparently map to this nested convention attribute. I find this similar to how ex. HTML can nest both dom-elements and text at the same time.

I understand that people might have widely different use cases, but personally I would much rather have a "your on your own"-option for APM and not have APM assume that I make mapping conflicts.

I hope you find this helpful, and not too much wall of text :slight_smile:

1 Like

@Annxii thank you! I appreciate you taking the time.

I ran into a bit of the same problem when trying to get my transaction to be marked as messaging instead of known. I had defined the Otel attribute messaging.destination.name in accordance with the latest Otel semantic conventions (1.21), however digging through apm-data showed it was still using messaging.destination (1.5).

Oof, sorry about that - we have an issue about handling semconv schema upgrades, but it hasn't been prioritised yet.

With the contribution of ECS to OpenTelemetry (which I applaud!), I think this also applies to an OpenTelemetry integration, at least in the long-term.

In theory, maybe, but reality is a lot messier :wink:

https://github.com/open-telemetry/opentelemetry-specification/blob/v1.22.0/specification/common/attribute-naming.md says

  • Names SHOULD NOT coincide with namespaces. For example if service.instance.id is an attribute name then it is no longer valid to have an attribute named service.instance because service.instance is already a namespace. Because of this rule be careful when choosing names: every existing name prohibits existence of an equally named namespace in the future, and vice versa: any existing namespace prohibits existence of an equally named attribute key in the future.

Which sounds good, except:

  • it's SHOULD, not MUST, and any leeway given is eventually taken
  • it's common to translate other protocols/systems to OTel (i.e. through receivers), and they don't all adhere to these rules

I think it's fair to say that users should adhere to those rules, but it's not always an end user doing the wrong thing. It might be a third-party instrumentation library doing the wrong thing; ot maybe it's fine in isolation, but breaks when combined with another one. e.g. maybe the x and x.y are coming from two different libraries used by a service. Then the default user experience is that half the data doesn't show up, which I think is worse than having dotted field names - that's just my opinion though.

I understand that people might have widely different use cases, but personally I would much rather have a "your on your own"-option for APM and not have APM assume that I make mapping conflicts.

Understood. Future directions aren't set in stone, and are actively being discussed -- we'll take your input into consideration. Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.