CloudFront Logs into ES - Index Mapping Query with Updating to 7.1?

Okay, previously I've setup an AWS Lambda function to automatically collect AWS CloudFront logs from an S3 bucket and push them into an AWS ElasticSearch cluster. This was working fine with ES version 6. However, after a few issues with the AWS ElasticSearch solution itself, and wanting to hold more data, I decided to build a self-managed ES cluster for the job, and went for the latest ES version available -- 7.1.

This immediately broke things and after reading up on some of the mapping changes and deprecations, I adjusted my code. However, my changes haven't seemed to work and I'm now at the point having gone around in circles several times missing something. Because I can get it to work manually running the python code, which I use for my lambda function, either locally on the ES node, or remotely etc. it indexes the data okay.

However, when it runs as a Lambda function, it seems to create the index for the date, but then fails to push any of the logs into the indices, they just sit there at 230bytes, and it provides no useful error details whatsoever in spite of all my attempts to log something from the function.

This is the mapping code I previously used for the Lambda function pushing into ES version 6:

fieldnames = (
    'date',  # date
    'time',  # time
    'edge-location',  # x-edge-location
    'src-bytes',  # sc-bytes
    'ip',  # c-ip
    'method',  # cs-method
    'host',  # cs(Host)
    'uri-stem',  # cs-uri-stem
    'status',  # sc-status
    'referer',  # cs(Referer)
    'user-agent',  # cs(User-Agent)
    'uri-query',  # cs-uri-query
    'cookie',  # cs(Cookie)
    'edge-result-type',  # x-edge-result-type
    'edge-request-id',  # x-edge-request-id
    'host-header',  # x-host-header
    'protocol',  # cs-protocol
    'resp-bytes',  # cs-bytes
    'time-taken',  # time-taken
    'forwarded-for',  # x-forwarded-for
    'ssl-protocol',  # ssl-protocol
    'ssl-cipher',  # ssl-cipher
    'edge-response-result-type',  # x-edge-response-result-type
    'fle-status',  # fle-status
    'fle-encrypted-fields'  # fle-encrypted-fields
)
record_mapping = {
    "mappings": {
        "logs": {
            "properties": {
                "timestamp": {
                    "type": "date",
                    "format": "yyyy-MM-dd'T'HH:mm:ssZ",
                },
                "resp-bytes": {
                    "type": "integer",
                },
                "src-bytes": {
                    "type": "integer",
                },
                "status": {
                    "type": "integer",
                },
                "ip": {
                    "type": "ip",
                },
            }
        }
    }
}

This is the code I've updated trying to account for the ES version 7.1 changes:

fieldnames = (
    'date',  # date
    'time',  # time
    'edge-location',  # x-edge-location
    'src-bytes',  # sc-bytes
    'ip',  # c-ip
    'method',  # cs-method
    'host',  # cs(Host)
    'uri-stem',  # cs-uri-stem
    'status',  # sc-status
    'referer',  # cs(Referer)
    'user-agent',  # cs(User-Agent)
    'uri-query',  # cs-uri-query
    'cookie',  # cs(Cookie)
    'edge-result-type',  # x-edge-result-type
    'edge-request-id',  # x-edge-request-id
    'host-header',  # x-host-header
    'protocol',  # cs-protocol
    'resp-bytes',  # cs-bytes
    'time-taken',  # time-taken
    'forwarded-for',  # x-forwarded-for
    'ssl-protocol',  # ssl-protocol
    'ssl-cipher',  # ssl-cipher
    'edge-response-result-type',  # x-edge-response-result-type
    'protocol-version',  # cs-protocol-version
    'fle-status',  # fle-status
    'fle-encrypted-fields'  # fle-encrypted-fields
)
record_mapping = {
    "mappings": {
        "properties": {
            "timestamp": {
                "type": "date",
                "format": "yyyy-MM-dd'T'HH:mm:ssZ",
            },
            "resp-bytes": {
                "type": "integer",
            },
            "src-bytes": {
                "type": "integer",
            },
            "stack-id": {
                "type": "integer",
            },
            "status": {
                "type": "integer",
            },
            "ip": {
                "type": "ip",
            },
        }
    }
}

and I've also tried this for the mapping with the date_time_no_millis format.

record_mapping = {
    "mappings": {
        "properties": {
            "timestamp": {
                "type": "date",
                "format": "date_time_no_millis",
            },
            "resp-bytes": {
                "type": "integer",
            },
            "src-bytes": {
                "type": "integer",
            },
            "stack-id": {
                "type": "integer",
            },
            "status": {
                "type": "integer",
            },
            "ip": {
                "type": "ip",
            },
        }
    }
}

CloudFront logs are pretty common, nothing all that special. Again, I've become a little lost as it appears to index things ok when running the python code manually, just not running as a lambda function. And only failing for version 7.x.

But maybe something to do with how im testing, ie. clearing out all indices and data each time. Should I have a dedicated mapping template in the cluster at all times? Am I not following the correct mapping for changes in ES v.7.x?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.