Troubleshooting missing links for some OpenTelemetry services in the APM Service Map

Observed behavior:

The APM service map doesn't display links between some OpenTelemetry services, but those links do appear in the trace samples.

When searching in Discover for the trace.id of one of the orphaned services on the service map (e.g. payment service), spans are included for other services (e.g. api-gateway service), which is expected and desired.

When loading the Overview page for one of the orphaned services on the service map (e.g. payment service) Kibana displays a popup with the error message:

Error while fetching resource

Error
search_phase_execution_exception: [script_exception] Reason: link error (500)

Similarly, some but not all distributed traces are captured despite nearly identical instrumentation code among the services.

Question:

What is the best way to troubleshoot missing links in the service map? What does the service map require to display links between services instrumented with OpenTelemetry?

Environment:

  • Elastic Cloud on GCP us-east1 with Elastic 8.1.2
  • All services are instrumented with OpenTelemetry (see source code)
  • All services use Python, except for api-gateway which uses nginx (via the opentelemetry module)
  • The Python services use the same code and environment variables for instrumentation
  • The screenshots (below) were scoped to the same time ranges and environments

Screenshot 1 of 2 - The service map is missing links among some services (the product and payment service).

Screenshot 2 of 2 - This trace sample does display links between the services that were unlinked in the service map. This screenshot shows that the payment service is linked to the api-gateway service, but that link doesn't appear in the service map.

I looked at the JSON response from the XHR that the service map sends to /internal/apm/service-map and I see that service.environment is null for the two orphaned services (payment and products). I'm unsure why those two services have null values because they are given the same environment variables (OTEL_RESOURCE_ATTRIBUTES=deployment.environment=development) and instrumentation logic as the other services. This does seem like something that would cause the services to appear orphaned in the service map.

EDIT: I've confirmed that the spans for payment and product services all have service.deployment set to development, verified by searching Discover with the query (service.name:payment OR service.name:product) AND NOT service.environment:development). I'm not sure why these services have null as the service.environment in the service map.

{
  "elements": [{
    "data": {
      "id": "web-gateway",
      "service.environment": "development",
      "service.name": "web-gateway",
      "agent.name": "opentelemetry/python"
    }
  }, {
    "data": {
      "id": "api-gateway",
      "service.name": "api-gateway",
      "agent.name": "opentelemetry/cpp"
    }
  }, {
    "data": {
      "id": "content",
      "service.environment": "development",
      "service.name": "content",
      "agent.name": "opentelemetry/python"
    }
  }, {
    "data": {
      "span.subtype": "http",
      "span.destination.service.resource": "storage.googleapis.com:443",
      "span.type": "external",
      "id": ">storage.googleapis.com:443",
      "label": "storage.googleapis.com:443"
    }
  }, {
    "data": {
      "id": "checkout",
      "service.environment": "development",
      "service.name": "checkout",
      "agent.name": "opentelemetry/python"
    }
  }, {
    "data": {
      "id": "cart",
      "service.environment": "development",
      "service.name": "cart",
      "agent.name": "opentelemetry/python"
    }
  }, {
    "data": {
      "span.subtype": "redis",
      "span.destination.service.resource": "redis",
      "span.type": "db",
      "id": ">redis",
      "label": "redis"
    }
  }, {
    "data": {
      "service.name": "product",
      "agent.name": "opentelemetry/python",
      "service.environment": null,
      "id": "product"
    }
  }, {
    "data": {
      "service.name": "payment",
      "agent.name": "opentelemetry/python",
      "service.environment": null,
      "id": "payment"
    }
  }, {
    "data": {
      "source": "api-gateway",
      "target": "cart",
      "id": "api-gateway~cart",
      "sourceData": {
        "id": "api-gateway",
        "service.name": "api-gateway",
        "agent.name": "opentelemetry/cpp"
      },
      "targetData": {
        "id": "cart",
        "service.environment": "development",
        "service.name": "cart",
        "agent.name": "opentelemetry/python"
      }
    }
  }, {
    "data": {
      "source": "api-gateway",
      "target": "checkout",
      "id": "api-gateway~checkout",
      "sourceData": {
        "id": "api-gateway",
        "service.name": "api-gateway",
        "agent.name": "opentelemetry/cpp"
      },
      "targetData": {
        "id": "checkout",
        "service.environment": "development",
        "service.name": "checkout",
        "agent.name": "opentelemetry/python"
      },
      "bidirectional": true
    }
  }, {
    "data": {
      "source": "api-gateway",
      "target": "content",
      "id": "api-gateway~content",
      "sourceData": {
        "id": "api-gateway",
        "service.name": "api-gateway",
        "agent.name": "opentelemetry/cpp"
      },
      "targetData": {
        "id": "content",
        "service.environment": "development",
        "service.name": "content",
        "agent.name": "opentelemetry/python"
      }
    }
  }, {
    "data": {
      "source": "cart",
      "target": ">redis",
      "id": "cart~>redis",
      "sourceData": {
        "id": "cart",
        "service.environment": "development",
        "service.name": "cart",
        "agent.name": "opentelemetry/python"
      },
      "targetData": {
        "span.subtype": "redis",
        "span.destination.service.resource": "redis",
        "span.type": "db",
        "id": ">redis",
        "label": "redis"
      }
    }
  }, {
    "data": {
      "source": "checkout",
      "target": "api-gateway",
      "id": "checkout~api-gateway",
      "sourceData": {
        "id": "checkout",
        "service.environment": "development",
        "service.name": "checkout",
        "agent.name": "opentelemetry/python"
      },
      "targetData": {
        "id": "api-gateway",
        "service.name": "api-gateway",
        "agent.name": "opentelemetry/cpp"
      },
      "isInverseEdge": true
    }
  }, {
    "data": {
      "source": "content",
      "target": ">storage.googleapis.com:443",
      "id": "content~>storage.googleapis.com:443",
      "sourceData": {
        "id": "content",
        "service.environment": "development",
        "service.name": "content",
        "agent.name": "opentelemetry/python"
      },
      "targetData": {
        "span.subtype": "http",
        "span.destination.service.resource": "storage.googleapis.com:443",
        "span.type": "external",
        "id": ">storage.googleapis.com:443",
        "label": "storage.googleapis.com:443"
      }
    }
  }, {
    "data": {
      "source": "web-gateway",
      "target": "api-gateway",
      "id": "web-gateway~api-gateway",
      "sourceData": {
        "id": "web-gateway",
        "service.environment": "development",
        "service.name": "web-gateway",
        "agent.name": "opentelemetry/python"
      },
      "targetData": {
        "id": "api-gateway",
        "service.name": "api-gateway",
        "agent.name": "opentelemetry/cpp"
      }
    }
  }]
}

I've moved this conversation to Github (elastic/kibana 130319) as a possible bug.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.