Elastic APM AWS Lambda Python

Kibana version:
v8.3.1

Elasticsearch version:
v8.3.1

APM Server version:
APM Server version 8.3.1

APM Agent language and version:
elastic-apm==6.10.1 using Python 3.8

Browser version:
Google Chrome Versão 103.0.5060.114

Original install method (e.g. download page, yum, deb, from source, etc.) and version:
Docker Layer

Is there anything special in your setup?
Using the default values.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):
I'm using AWS Lambda Container Python image based and using SAM to deploy my Function that has S3 trigger.
I'm using the decorator "@capture_serverless()" on my main handler() function, but I'm not able to see ANY transaction, error or anything on my service.
The weird thing is that I'm able to see my service "lambda-vda" on Kibana, but always empty.

Steps to reproduce:
Dockerfile with the following lines at top:

FROM docker.elastic.co/observability/apm-lambda-extension-x86_64:latest AS lambda-extension

FROM public.ecr.aws/lambda/python:3.8

COPY --from=lambda-extension /opt/elastic-apm-extension /opt/extensions/elastic-apm-extension

all my related function stuff being instaled here, like TESSERACT and another python packages.

Provide logs and/or server output (if relevant):
My enviroment variables in the template;yaml SAM file:

      Environment:
        Variables:
          ELASTIC_APM_LAMBDA_APM_SERVER: https://*.apm.eastus2.azure.elastic-cloud.com:443
          ELASTIC_APM_SECRET_TOKEN: ****
          ELASTIC_APM_LOG_LEVEL: trace
          ELASTIC_APM_DATA_RECEIVER_TIMEOUT_SECONDS: 60
          ELASTIC_APM_DATA_FORWARDER_TIMEOUT_SECONDS: 15
          ELASTIC_APM_SERVICE_NAME: lambda-vda

So here I will share some logs that I'm able to see on my Cloudwatch logs related to some Lambda event:

{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:56:25.759Z",
    "log.origin": {
        "file.name": "apm-lambda-extension/main.go",
        "file.line": 70
    },
    "message": "Register response: {\n\t\"functionName\": \"vda\",\n\t\"functionVersion\": \"$LATEST\",\n\t\"handler\": \"\"\n}",
    "ecs.version": "1.6.0"
}

I see A LOT of warnings like this

[DEBUG]	2022-07-11T16:51:51.584Z	5e8b707b-7833-4b14-829c-5fa12ddde65f	Skipping instrumentation of redis. Module redis.client not found

{
    "log.level": "info",
    "@timestamp": "2022-07-11T14:56:25.784Z",
    "log.origin": {
        "file.name": "logsapi/subscribe.go",
        "file.line": 122
    },
    "message": "Extension listening for Lambda Logs API events on 169.254.79.130:43049",
    "ecs.version": "1.6.0"
}
{
    "log.level": "info",
    "@timestamp": "2022-07-11T14:56:25.789Z",
    "log.origin": {
        "file.name": "apm-lambda-extension/main.go",
        "file.line": 112
    },
    "message": "Waiting for next event...",
    "ecs.version": "1.6.0"
}

After a log showing "received event.":

{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:56:32.562Z",
    "log.origin": {
        "file.name": "apm-lambda-extension/main.go",
        "file.line": 126
    },
    "message": "{\n\t\"eventType\": \"INVOKE\",\n\t\"deadlineMs\": 1657551692561,\n\t\"requestId\": \"2b89ce98-84bd-4eea-91bb-9fae8faa9edf\",\n\t\"invokedFunctionArn\": \"arn:aws:lambda:us-east-1:
**:function:vda\",\n\t\"tracing\": {\n\t\t\"type\": \"X-Amzn-Trace-Id\",\n\t\t\"value\": \"Root=1-62cc3a19-0ecf8a350f40e8b406617329;Parent=5797d681116f11e8;Sampled=0\"\n\t}\n}",
    "ecs.version": "1.6.0"
}
{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:56:42.762Z",
    "log.origin": {
        "file.name": "extension/apm_server_transport.go",
        "file.line": 182
    },
    "message": "Transport status set to healthy",
    "ecs.version": "1.6.0"
}
{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:56:52.511Z",
    "log.origin": {
        "file.name": "extension/apm_server_transport.go",
        "file.line": 200
    },
    "message": "APM server Transport status set to Healthy",
    "ecs.version": "1.6.0"
}
{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:56:52.511Z",
    "log.origin": {
        "file.name": "extension/apm_server_transport.go",
        "file.line": 183
    },
    "message": "APM server response body: ",
    "ecs.version": "1.6.0"
}

{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:57:33.166Z",
    "log.origin": {
        "file.name": "apm-lambda-extension/main.go",
        "file.line": 176
    },
    "message": "Received agent done signal",
    "ecs.version": "1.6.0"
}
{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:57:33.166Z",
    "log.origin": {
        "file.name": "logsapi/subscribe.go",
        "file.line": 178
    },
    "message": "Current invocation over. Interrupting logs processing goroutine",
    "ecs.version": "1.6.0"
}
{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:57:33.177Z",
    "log.origin": {
        "file.name": "extension/apm_server_transport.go",
        "file.line": 101
    },
    "message": "Flush started - Checking for agent data",
    "ecs.version": "1.6.0"
}
{
    "log.level": "debug",
    "@timestamp": "2022-07-11T14:57:33.177Z",
    "log.origin": {
        "file.name": "extension/apm_server_transport.go",
        "file.line": 110
    },
    "message": "Flush ended - No agent data on buffer",
    "ecs.version": "1.6.0"
}

{
    "log.level": "debug",
    "@timestamp": "2022-07-11T15:03:20.757Z",
    "log.origin": {
        "file.name": "apm-lambda-extension/main.go",
        "file.line": 126
    },
    "message": "{\n\t\"eventType\": \"SHUTDOWN\",\n\t\"deadlineMs\": 1657551802714,\n\t\"requestId\": \"\",\n\t\"invokedFunctionArn\": \"\",\n\t\"tracing\": {\n\t\t\"type\": \"\",\n\t\t\"value\": \"\"\n\t}\n}",
    "ecs.version": "1.6.0"
}


I'm not able to find in the logs any error or a message the seems an error.
I use the same credentials in another elastic-apm integration with Django without a problem.

Are you also installing the Python Agent as part of your python packages? It is (currently) not available as part of the apm-lambda-extension image.

Yes.
I'm using the elastic-apm==6.10.1 version.

Are you by chance seeing any documents in the APM indices? Curious if the data is actually not being sent or if it's not showing up correctly. The fact that you see the service name (though empty) is a bit odd to me.

Which index should I look to make sure?

Tried with:

POST apm-7.16.3-span-*/_search
{
    "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "match": {
      "service.name": "lambda-vda"
    }
  }
}

And this returns an empty result. ( very weird )
If I use the Django service name, I'm able to see my transactions, so I think that I'm looking at the right index and have a valid query, but feel free to guide me through this.

Trying with a more robust search:

POST _all/_search
{
  "query": {
    "match": {
      "service.name": "lambda-vda"
    }
  }
}

I'm able to see some results :open_mouth:

{
        "_index": ".ds-traces-apm-default-2022.07.01-000016",
        "_id": "COFb74EBWGaFdgKJHMwA",
        "_score": 4.54838,
        "_source": {
          "parent": {
            "id": "169dbd01670a7062"
          },
          "agent": {
            "name": "python",
            "version": "6.10.1"
          },
          "data_stream.namespace": "default",
          "data_stream.type": "traces",
          "processor": {
            "name": "transaction",
            "event": "span"
          },
          "observer": {
            "hostname": "8cb034d6a71e",
            "id": "3bd6e8c7-1611-4f8c-af1c-e40494710913",
            "ephemeral_id": "9911db98-89a9-4665-8016-7828163cf769",
            "type": "apm-server",
            "version": "8.3.1"
          },
          "trace": {
            "id": "1577996be9fe3354c69b3ef5201da239"
          },
          "@timestamp": "2022-07-11T22:22:09.597Z",
          "ecs": {
            "version": "1.12.0"
          },
          "service": {
            "name": "lambda-vda"
          },
          "data_stream.dataset": "apm",
          "event": {
            "agent_id_status": "missing",
            "ingested": "2022-07-11T22:22:14Z",
            "outcome": "success"
          },
          "transaction": {
            "id": "169dbd01670a7062"
          },
          "timestamp": {
            "us": 1657578129597972
          },
          "span": {
            "duration": {
              "us": 187054
            },
            "stacktrace": [
              {
                "exclude_from_grouping": false,
                "abs_path": "/var/task/app.py",
                "filename": "task/app.py",
                "line": {
                  "number": 398
                },
                "function": "handler",
                "module": "app"
              },
              {
                "exclude_from_grouping": false,
                "library_frame": true,
                "abs_path": "/var/lang/lib/python3.8/site-packages/elasticapm/contrib/serverless/aws.py",
                "filename": "elasticapm/contrib/serverless/aws.py",
                "line": {
                  "number": 116
                },
                "function": "decorated",
                "module": "elasticapm.contrib.serverless.aws"
              },
              {
                "exclude_from_grouping": false,
                "abs_path": "/var/runtime/bootstrap.py",
                "filename": "runtime/bootstrap.py",
                "line": {
                  "number": 127
                },
                "function": "handle_event_request",
                "module": "__main__"
              },
              {
                "exclude_from_grouping": false,
                "abs_path": "/var/runtime/bootstrap.py",
                "filename": "runtime/bootstrap.py",
                "line": {
                  "number": 468
                },
                "function": "main",
                "module": "__main__"
              },
              {
                "exclude_from_grouping": false,
                "abs_path": "/var/runtime/bootstrap.py",
                "filename": "runtime/bootstrap.py",
                "line": {
                  "number": 480
                },
                "function": "<module>",
                "module": "__main__"
              }
            ],
            "subtype": "custom",
            "name": "functions.image.resize",
            "id": "a64075d8661c5599",
            "type": "code"
          }
        }
      }

And it seems to be a valid call that my Function does!
Just run my function and I'm able to see my transactions being inserted ( I've sorted by @timestamp )

Like:

POST .ds-traces-apm*/_search?size=500
{
    "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "term": {
      "service.name": "lambda-vda"
    }
  }
}

And I've just deployed my function with another name and make a single action to trigger it. ( upload a file to S3 ).
And I just saw a BUNCH of log being inserted on indices like ".ds-traces-apm-default-2022.07.01-000016"
And I just saw that the "dependencies" tab always is being filled with some API's that I'm consuming.

So why those transactions are not appearing on Kibana seems to be the next question. haha

Hi @marcoaleixo ,

can you please check whether there are any documents with the processor.event value being transaction? (In your example above it's the value span.)

If there are none, that would explain why you don't see anything in the Kibana UI. In that case we can then focus on understanding why the transactions are not reported.

Hey

Just ran the query:

POST _all/_search?size=9999
{
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "service.name": "lambda-vda"
          }
        },
        {
          "term": {
            "processor.event": "transaction"
          }
        }
      ]
    }
  }
}

And I'm getting zero hits as response.

So it seems like the agent is reporting spans, but not the parent transactions.

Can you give us more details on the structure of your function? I'm wondering if there's some oddity that is interrupting the end of the transaction so it's not being properly reported.

So I need to compile Tesseract manually.

This is the initial lines of my Dockerfile:

FROM docker.elastic.co/observability/apm-lambda-extension-x86_64:latest AS lambda-extension

FROM public.ecr.aws/lambda/python:3.8

COPY --from=lambda-extension /opt/elastic-apm-extension /opt/extensions/elastic-apm-extension

ARG HANDLER=vda
ENV HANDLER ${HANDLER}

RUN yum update -y
ARG LEPTONICA_VERSION=1.78.0
ARG TESSERACT_VERSION=4.1.0-rc4
ARG AUTOCONF_ARCHIVE_VERSION=2017.09.28
ARG TMP_BUILD=/tmp
ARG TESSERACT=/opt/tesseract
ARG LEPTONICA=/opt/leptonica
ARG DIST=/opt/build-dist

# change OCR_LANG to enable the layer for different languages
ARG OCR_LANG=por

# change TESSERACT_DATA_SUFFIX to use different datafiles (options: "_best", "_fast" and "")
ARG TESSERACT_DATA_SUFFIX=
ARG TESSERACT_DATA_VERSION=4.0.0

RUN yum makecache fast; yum clean all && yum -y update && yum -y upgrade; yum clean all && \
    yum install -y yum-plugin-ovl; yum clean all && yum -y groupinstall "Development Tools"; yum clean all

RUN yum -y install gcc gcc-c++ make autoconf aclocal automake libtool \
    libjpeg-devel libpng-devel libtiff-devel zlib-devel \
    libzip-devel freetype-devel lcms2-devel libwebp-devel \
    libicu-devel tcl-devel tk-devel pango-devel cairo-devel; yum clean all

WORKDIR ${TMP_BUILD}/leptonica-build
RUN curl -L https://github.com/DanBloomberg/leptonica/releases/download/${LEPTONICA_VERSION}/leptonica-${LEPTONICA_VERSION}.tar.gz | tar xz && cd ${TMP_BUILD}/leptonica-build/leptonica-${LEPTONICA_VERSION} && \
    ./configure --prefix=${LEPTONICA} && make && make install && cp -r ./src/.libs /opt/liblept

WORKDIR ${TMP_BUILD}/autoconf-build
RUN curl https://ftp.gnu.org/gnu/autoconf-archive/autoconf-archive-${AUTOCONF_ARCHIVE_VERSION}.tar.xz | tar xJ && \
    cd autoconf-archive-${AUTOCONF_ARCHIVE_VERSION} && ./configure && make && make install && cp ./m4/* /usr/share/aclocal/

WORKDIR ${TMP_BUILD}/tesseract-build
RUN curl -L https://github.com/tesseract-ocr/tesseract/archive/${TESSERACT_VERSION}.tar.gz | tar xz && \
    cd tesseract-${TESSERACT_VERSION} && ./autogen.sh  && PKG_CONFIG_PATH=/opt/leptonica/lib/pkgconfig LIBLEPT_HEADERSDIR=/opt/leptonica/include \
    ./configure --prefix=${TESSERACT} --with-extra-includes=/opt/leptonica/include --with-extra-libraries=/opt/leptonica/lib && make && make install

WORKDIR /opt
RUN mkdir -p ${DIST}/lib && mkdir -p ${DIST}/bin && \
    cp ${TESSERACT}/bin/tesseract ${DIST}/bin/ && \
    cp ${TESSERACT}/lib/libtesseract.so.4  ${DIST}/lib/ && \
    cp ${LEPTONICA}/lib/liblept.so.5 ${DIST}/lib/liblept.so.5 && \
    cp /usr/lib64/libwebp.so.4 ${DIST}/lib/ && \
    echo -e "LEPTONICA_VERSION=${LEPTONICA_VERSION}\nTESSERACT_VERSION=${TESSERACT_VERSION}\nTESSERACT_DATA_FILES=tessdata${TESSERACT_DATA_SUFFIX}/${TESSERACT_DATA_VERSION}" > ${DIST}/TESSERACT-README.md && \
    find ${DIST}/lib -name '*.so*' | xargs strip -s

WORKDIR ${DIST}/tesseract/share/tessdata
RUN curl -L https://github.com/tesseract-ocr/tessdata${TESSERACT_DATA_SUFFIX}/raw/${TESSERACT_DATA_VERSION}/osd.traineddata > osd.traineddata && \
    curl -L https://github.com/tesseract-ocr/tessdata${TESSERACT_DATA_SUFFIX}/raw/${TESSERACT_DATA_VERSION}/eng.traineddata > eng.traineddata && \
    curl -L https://github.com/tesseract-ocr/tessdata${TESSERACT_DATA_SUFFIX}/raw/${TESSERACT_DATA_VERSION}/${OCR_LANG}.traineddata > ${OCR_LANG}.traineddata

ENV TESSDATA_PREFIX=/opt/tesseract/share/tessdata/
RUN ln -s /opt/tesseract/bin/tesseract /usr/local/bin/tesseract
RUN mv *.traineddata /opt/tesseract/share/tessdata/

RUN yum update -y
RUN yum install -y openssl pyOpenSSL opencv-python ghostscript python-pip

# Python Dependencies
RUN pip install torch==1.8.1+cpu torchvision==0.9.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
RUN python -m pip install detectron2==0.4 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.8/index.html
RUN pip install -r ${LAMBDA_TASK_ROOT}/dependencies/requirements.txt

After that I only install my python packages and I put my CMD like:

CMD [ "app.handler" ]

And on my handler basically I call a bunch of python functions and make some HTTP requests.

Another strange behaviour that I'm facing.
If I put my ENV variables in my Dockerfile, the agent is able to insert the spans like we saw.

But if I use only the ENV from SAM, removing it from the Dockerfile, the agent start to throw errors like:

[WARNING]	2022-07-12T17:36:43.485Z	1e2b6c2e-a4db-40cf-a341-404e91b515d8	Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f28223a1610>: Failed to establish a new connection: [Errno 111] Connection refused')': /intake/v2/events
[ERROR]	2022-07-12T17:36:54.282Z	1e2b6c2e-a4db-40cf-a341-404e91b515d8	Failed to submit message: 'Connection to APM Server timed out (url: http://localhost:8200/intake/v2/events, timeout: 5 seconds)'

But in the AWS Lambda panel I'm able to see my ENV from the template.yaml file:

Another strange behaviour:


This service makes no sense, textract is an AWS API that my function consume.

Ah, and all spans that is being captured I have the decorator @elasticapm.capture_span() in each function.

Definitely some weird behavior you're seeing.

I think without a small reproduction case, the next place to go is some debug logging. I would put a log here and here to try to figure out where we're going wrong with our transaction begin/end.

Another thing to look for -- are you seeing these debug logs? You seem to have debug logging correctly enabled because you're seeing the Skipping instrumentation of redis. Module redis.client not found type of logs. I would expect to see those flushing logs on every invocation of your function if everything is working correctly.

Yes, I'm able to see the flush logging in every request.
I was able to make it work building my own client, begin_transaction and end_transaction.
And I use by my own the capture_span().

In future maybe I will be back and try to debug this.

Thank you.

Glad you got it figured out for now. Keep us posted if you need help in the future or want to revisit our built-in solution!

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.