Elastic Serverless Forwarder for AWS adding reserved _id field when sending to logstash

Hi All!
I am new to ES, so apologies in advance if I mis-state some things.

We are looking to use the ES Serverless Forwarder for AWS (Elastic Serverless Forwarder for AWS | Elastic Serverless Forwarder Guide | Elastic) to send data to logstash before sending it on to our self hosted ES cluster.

The initial simple setup is using an emitting lambda that sends a json log event every minute to the logs, which the forwarder gets subscribed to as an event from CW when the event is emitted into the lambdas CW logs.

sample event:

{
    "timestamp": "2023-08-03T21:39:07.179193",
    "random_field": "constant string value here",
    "aws_request_id": "98f1d8fb-3361-474f-b525-10c6513edb36"
}

There are no filters, the event is getting triggered and sent to logstash, which returns a 200.

The logstash logs are showing a 400 with inability to create the index because the log event is containing a reserved field _id.

error log snippet:
{"type":"mapper_parsing_exception","reason":"failed to parse field [_id] of type [_id] in document with id 'jJdVvIkBlVXM_z6z4Rmr'. Preview of field's value: '1691081882600-77a39f7173d6b4b6455fd7ae9f2fb147afd919365019f91f9bce160b1db21b100f45cc91f0c04489020b2725e53bacad-000000000000'","caused_by":{"type":"mapper_parsing_exception","reason":"Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."}}}}}.

We are trying this with es and logstash version 7.16.2 - which should be supported per the docs.

I understand _id is reserved, but am surprised the forwarder is sending it as part of the document to be indexed, especially since the _id field is not part of the meta data of the original event

I have seen suggestions on filtering out the field from the AWS CW subscription, as well as removing or renaming the field in logstash, but I am surprised this would be needed at all since the field is not present in the source, and I would not expect the ES addon to include this field in a minimal implementation with no mapping.

Any thoughts appreciated, and thank you all for reading!

@jsoriano any thoughts?

@stabbotco1

Can you share your logstash configuration please?

I do not use the ESF, but looking for it the _id field is indeed created by the forwarder as this github issue makes clear.

For what I understood it is used to avoid duplicates in Elasticsearch.

The documentation on how the Logstash pipeline for the ESF should look like is non-existent, but I think that if you add the following line in your elasticsearch output in your Logstash configuration it should work:

document_id => "%{_id}"

This will tell Elasticsearch to use the _id field as the id of the document and should avoid the mapping error that you are getting.

2 Likes

Thank you! All the information was spot on and very helpful!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.