Kibana APM not showing any data for the last two weeks

Kibana version: v8.8.1

Elasticsearch version: v8.8.1

APM Server version: v8.8.0

Browser version: FireFox v116.0.3; Chrome v115.0.5790.170

Original install method (e.g. download page, yum, deb, from source, etc.) and version: N/A

Fresh install or upgraded from other version?: N/A

Is there anything special in your setup? Logstash is being used (separate, not yet integrated with OpenTelemetry/APM stuff); A load-balancer is in front of the APM servers, I think.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

We've just recently started our OpenTelemetry journey; this hasn't all been completely set up yet; but I had OpenTelemetry-reported traces, via apm-server, showing up in Kibana for a month or two as I worked on things.

Unfortunately, starting a couple weeks ago, OpenTelemetry data / traces suddenly stopped appearing in the "APM" section of Kibana:

Screenshot of missing data, filtered to 'past two weeks':

… more content in later posts, because this forum has ridiculous spam-prevention limits for an official support-forum (one image? no links?? seriously??) …

If I search farther back, older data is visible:

Screenshot of data present, 21 days ago:

The data exists, as I can track down individual events in the "Discover" view:

Screenshot of recent data visible in the 'Discover' view:

One potential culprit I've found, digging around in Kibana settings, is this error:

Mapping conflict: 3 fields are defined as several types (string, integer, etc) across the indices that match this pattern. You may still be able to use these conflict fields in parts of Kibana, but they will be unavailable for functions that require Kibana to know their type. Correcting this issue will require reindexing your data.

Screenshot of error message and conflicted fields:

The conflicted fields are these:

  • event.success_count: byte, aggregate_metric_double, long, object
  • transaction.duration.histogram: histogram, object
  • transaction.duration.summary: aggregate_metric_double, object

I did find this help article and this blog post (dev dot sobeslavsky dot net slash kibana-how-to-solve-mapping-conflict); but before I can even start to follow those instructions (which I think seem to amount to 'set a fixed type for the fields in an "index template"), because I don't know which of those types each one should be.

Relatedly: How on earth did multiple types get set for these values? Aren't these part of APM? They don't seem to be anything under our control. How can I prevent this occurring in the future, when OpenTelemetry becomes critical to our investigation infrastructure, instead of just an experimental toy?

Okay, so I think something must have gone wrong during an Elasticsearch upgrade — the fields that are conflicted are precisely those that have breaking changes in APM. I found no mention of any errors reported by our infra people, though, so I'm not sure how or when this happened …

The indices created before July 22nd, and the new ones created after Aug 1st, correctly(?) have event.success_count (for example) declared as a "aggregate_metric_double" … but in that time-window, indices were getting created with event.success_count as a "byte"?

GET .ds-*apm*-2023.07.22-*/_mapping/field/event.success_count
// produces:
{
  ".ds-metrics-apm.transaction.10m-default-2023.07.22-000004": {
    "mappings": {
      "event.success_count": {
        "full_name": "event.success_count",
        "mapping": {
          "success_count": {
            "type": "aggregate_metric_double",
            // ...
          }}}}}}

GET .ds-*apm*-2023.07.23-*/_mapping/field/event.success_count
// produces:
{
  ".ds-traces-apm-default-2023.07.23-000454": {
    "mappings": {
      "event.success_count": {
        "full_name": "event.success_count",
        "mapping": {
          "success_count": {
            "type": "byte",
            // ...
          }}}}}}

GET .ds-*apm*-2023.08.01-*/_mapping/field/event.success_count
// produces:
{
  ".ds-metrics-apm.transaction.1m-default-2023.08.01-000019": {
    "mappings": {
      "event.success_count": {
        "full_name": "event.success_count",
        "mapping": {
          "success_count": {
            "type": "aggregate_metric_double",
            "type": "byte",
            // ...
          }}}}}}

(How could this have happened?)

At least one option, I think, is to wipe out all the traces data entirely; but that feels nuclear. It looks like I should, at least, be able to "use a reindex to change mappings" on the "data stream"; but I'm not sure if I can tell APM to use the new, reindexed data-stream afterwards …

I took a quick look at event.success_count and I have some pointers but no proper answer yet: I see a breaking change in 8.7. Did you ever upgrade components or were they always on this version? Not a great answer but potentially a clue where this went wrong.

And I tried to find the right datatype but I'm getting mixed signals on 8.8:

(also worth noting — before the 22nd, only the metrics-apm.transaction indices had the event.success_count mapping, it was completely absent for other indices; between the 22nd and 1st, every single APM-related index(!) has the type: "byte" mapping for event.success_count; and finally, post-1st, although metrics-apm.transaction is back to being the correct type … all of our other APM-related indices now report empty {} for that key?

GET .ds-*apm*-2023.08.01-*/_mapping/field/event.success_count
// produces:
{
  // ...
  ".ds-traces-apm-default-2023.08.01-000810": {
    "mappings": {}
  },
  ".ds-traces-apm-default-2023.08.01-000799": {
    "mappings": {}
  },
  ".ds-traces-apm-default-2023.08.01-000798": {
    "mappings": {}
  },
  ".ds-metrics-apm.transaction.1m-default-2023.08.01-000019": {
    "mappings": {
      "event.success_count": {
        "full_name": "event.success_count",
        "mapping": {
          "success_count": {
            "type": "aggregate_metric_double",
            "metrics": [
              "sum",
              "value_count"
            ],
            "default_metric": "sum"
          }
        }
      }
    }
  },
  ".ds-traces-apm-default-2023.08.01-000809": {
    "mappings": {}
  },
  ".ds-traces-apm-default-2023.08.01-000808": {
    "mappings": {}
  },
  ".ds-traces-apm-default-2023.08.01-000807": {
    "mappings": {}
  },
  // ...
}

Like I said on Slack: That looks like some upgrade issue or a very weird combination of things. Depending on how important the data is, you could either delete the indices, reindex to fix the mapping, or maybe fix it at query-time through runtime fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.