Elastic agent override existing index/datastream

Hi all,

We have a use case where we use elastic-agent to populate the data where we inject our own document._id via the ingest pipeline and everything works fine at first glance.

The thing is that when new data arrives, even thought the same id is extracted from the message, the document is not overriden. Nothing happens to the existing document even if new values arrive for the same id.

Here are examples of our ingest pipeline
Datastream:

[
  {
    "json": {
      "field": "message",
      "add_to_root": true,
      "if": "ctx?.message != null"
    }
  },
  {
    "set": {
      "field": "_id",
      "value": "{{id}}"
    }
  }
]

Index:

[
{
    "set": {
      "field": "_index",
      "value": "custom-index"
    }
  },
  {
    "json": {
      "field": "message",
      "add_to_root": true,
      "if": "ctx?.message != null"
    }
  },
  {
    "set": {
      "field": "_id",
      "value": "{{id}}"
    }
  }
]

Can anyone explain why the new data is not overriding the existing one, I was thinking this should be the default behaviour?
Versions we are running are 8.14.3

Hello Mihovil,

This is by design. Excerpt from the docs:

A data stream lets you store append-only time series data across multiple indices while giving you a single named resource for requests. Data streams are well-suited for logs, events, metrics, and other continuously generated data.

Datastreams only supports the ingest operation create(see op_type):

If the request targets a data stream, an op_type of create is required. See Add documents to a data stream.

If you really need to update your documents, a normal index with rollover alias may be the best way. If you still want to use the Elastic Agent, you can use an ingest pipeline to change the target index:

"set": {
        "description": "Index document to 'failed-<index>'",
        "field": "_index",
        "value": "my-custom-index"
      }

Best regards
Wolfram

If you need to update your documents you need to use normal indices, not data streams.

Using a custom _id with the Elastic Agent helps to solve issues regarding duplication of data, but it will not update documents in this way, if the _id already exists, the new document with the same will be rejected.

To update a document in a data stream you need to make a request directly to backing indice, but the elastic agent cannot do that.

Hi, thank you for your response. But as you can see in the question (example of second pipeline), i am trying to do that, but the result is the same.

Hi, thank you for the explanation. Since adding a processor in the pipeline to change the target index produces the same result, we will have to look for other solutions.

Hi,

would it be possible to "override" the default behaviour by having logstash that listens to elastic agent incoming data and does the upsert into elasticsearch?

No, Logstash will also not update the backing indices of a data stream.

If you have the requirement to update documents automatically you need to use normal indices, not data streams.

Elastic Agent uses data streams, so you would need to use them on Stand alone mode, as just a log collector, without parsing the data, or use other tools to get your data.