Using Transform for document count when document updated

There is an index that documents are updated with time. We are looking for some way to continuesly (every several minutes) provide count of documents grouped by some condition.

Example:
2024/08/15 00:00:00, order1, new
2024/08/15 01:00:00, order2, payed
2024/08/15 00:00:00, order3, payed
one our later, the index may looks like
2024/08/15 01:15:00, order1, payed
2024/08/15 01:00:00, order2, payed
2024/08/15 01:20:00, order3, shipped
2024/08/15 01:30:00, order4, payed

We are thinking to implement it with Elasticsearch transform, however in above senario, the count of "new" orders may remain as one. We are also thinking about other alternatives, such as add a timestamp indictate when transform generated document is last updated, but not sure if it is possible.

The queries are:
Is the above understanding is correct - Transform will not delete {status : "new",count : "1"} from transform dest index in above senario?
If this is true, could you suggest if there is any work around, like add an field indicating when the transform result is generated/updated

Thank you in advance

This query may similar to Transform behavior with deleted documents

Is the above understanding is correct - transform will not delete {status : "new",count : "1"} from transform dest index in above senario?

This is correct.

like add an field indicating when the transform result is generated/updated

This other answer may help - Creating an ingest pipeline for transforms.

  • create an ingest pipeline that adds a timestamp field
  • use the Put Transform API to set the pipeline in the dest.pipeline field

Any document added or updated in the destination index will get a new timestamp.

It would look something like:

PUT _ingest/pipeline/pipeline_add_ingest_timestamp
{ 
  "description": "Adds event.ingested field which represents time of ingestion.",
  "processors": [
    {
      "set": {
        "field": "event.ingested",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}
PUT _transform/new_orders_count
{
  ...
  "dest": {
    "index": "dest_index_name",
    "pipeline": "pipeline_add_ingest_timestamp"
  },
  ...
}

Thank you for the reply. We have setup to the transform and ingest pipeline, and we do see a ingest event timestamp added to docs.

Actually we are expected the ingest event will tell us what are transformed records are latest(different from latest updated), so that reader will be able to filter out items based on ingest event (to get docs expected in original post). However, it seems only latest updated docs have ingest event updated.

So seems transform does not update those records that having the same transform result, and ingest event time not updated.

Example:
2024/08/15 00:00:00, order1, new
2024/08/15 00:50:00, order2, payed
one our later, the index may looks like
2024/08/15 00:00:00, order1, new
2024/08/15 01:15:00, order2, closed

Before:

{status : "new", count : "1", ingest_time: "1:00:00"}
{status : "payed", count : "1",ingest_time: "1:00:00"}

After Actual

{status : "new", count : "1", ingest_time: "1:00:00"}  --2 00 00 expected
{status : "closed", count : "1",ingest_time: "2:00:00"}
{status : "payed", count : "1", ingest_time: "1:00:00"}

If this is correct, I am thinking we have to turn to periodical elasticsearch query to get statistic.

Checked multiple other potential solutions, I am trying to exporter the aggregation data to some other data storage.

Logstash elasticsearch input plugin seems to be an alternative. Sadly plugin version we are using does no support it.

Watcher seems to be another alternative. Below is a thread about it