Elastic agent override existing index/datastream

Mihovil_Maricic · July 24, 2024, 12:14pm

Hi all,

We have a use case where we use elastic-agent to populate the data where we inject our own document._id via the ingest pipeline and everything works fine at first glance.

The thing is that when new data arrives, even thought the same id is extracted from the message, the document is not overriden. Nothing happens to the existing document even if new values arrive for the same id.

Here are examples of our ingest pipeline
Datastream:

[
  {
    "json": {
      "field": "message",
      "add_to_root": true,
      "if": "ctx?.message != null"
    }
  },
  {
    "set": {
      "field": "_id",
      "value": "{{id}}"
    }
  }
]

Index:

[
{
    "set": {
      "field": "_index",
      "value": "custom-index"
    }
  },
  {
    "json": {
      "field": "message",
      "add_to_root": true,
      "if": "ctx?.message != null"
    }
  },
  {
    "set": {
      "field": "_id",
      "value": "{{id}}"
    }
  }
]

Can anyone explain why the new data is not overriding the existing one, I was thinking this should be the default behaviour?
Versions we are running are 8.14.3

Wolfram_Haussig · July 24, 2024, 12:37pm

Hello Mihovil,

This is by design. Excerpt from the docs:

A data stream lets you store append-only time series data across multiple indices while giving you a single named resource for requests. Data streams are well-suited for logs, events, metrics, and other continuously generated data.

Datastreams only supports the ingest operation create(see op_type):

If the request targets a data stream, an op_type of create is required. See Add documents to a data stream.

If you really need to update your documents, a normal index with rollover alias may be the best way. If you still want to use the Elastic Agent, you can use an ingest pipeline to change the target index:

"set": {
        "description": "Index document to 'failed-<index>'",
        "field": "_index",
        "value": "my-custom-index"
      }

Best regards
Wolfram

leandrojmp · July 24, 2024, 12:43pm

If you need to update your documents you need to use normal indices, not data streams.

Using a custom _id with the Elastic Agent helps to solve issues regarding duplication of data, but it will not update documents in this way, if the _id already exists, the new document with the same will be rejected.

To update a document in a data stream you need to make a request directly to backing indice, but the elastic agent cannot do that.

Mihovil_Maricic · July 24, 2024, 1:28pm

Hi, thank you for your response. But as you can see in the question (example of second pipeline), i am trying to do that, but the result is the same.

Mihovil_Maricic · July 24, 2024, 1:30pm

Hi, thank you for the explanation. Since adding a processor in the pipeline to change the target index produces the same result, we will have to look for other solutions.

Mihovil_Maricic · August 20, 2024, 9:23am

Hi,

would it be possible to "override" the default behaviour by having logstash that listens to elastic agent incoming data and does the upsert into elasticsearch?

leandrojmp · August 20, 2024, 12:13pm

No, Logstash will also not update the backing indices of a data stream.

If you have the requirement to update documents automatically you need to use normal indices, not data streams.

Elastic Agent uses data streams, so you would need to use them on Stand alone mode, as just a log collector, without parsing the data, or use other tools to get your data.

Topic		Replies	Views
Ingest pipeline - Override datastream Elastic Agent ingest-pipeline	3	374	January 11, 2023
Elasticsearch output for Elastic Agent - adding an ingest pipeline Beats elastic-agent , ingest-pipeline	5	3018	December 23, 2020
Ingest pipeline set processor override property Elasticsearch	1	866	June 27, 2020
Set op_type to "index" filebeat / elastic-agent Elastic Agent filebeat	8	1077	January 16, 2023
Elastic-Agent - Custom Log Integration Beats elastic-agent	4	3260	December 17, 2020

Elastic agent override existing index/datastream

Related topics