'Latest' Transform job not refreshing all docs in the destination index

ykara84 · May 26, 2022, 2:12pm

Hi, I'm new to Transforms.

I have a Powershell script that every hour gathers VMware virtual machine capacity metrics (cpu, memory and such) from vCenter and stores them in an index: virtualisation-vm-yyyy-MM

I've created a simple latest continuous Transform Job that sends the latest records from the index-pattern virtualisation-vm-* to a destination index called virtualisation-latest-vm-vsphere. I thought it was working as expected but after a few days I've noticed some VMs in the latest destination index have not updated since the initial transform job was created even though in the original index new data has been added for those VMs every hour since the Transform was created

I've recreated the Transform job several times with different settings (freq, delay, etc) but with no difference. How do I debug this? There's no errors or warnings in the messages section of the Transform

again this does not affect all docs, the majority update themselves, but about 10% don't and it doesn't always seem to be the same ones that don't update when I recreate the transform job

Below are my settings:

(3 node cluster)

value={
  "id": "virtualisation-latest-vm-vsphere",
  "version": "7.16.2",
  "create_time": 1653387161554,
  "source": {
    "index": [
      "virtualisation-vm-*"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "virtualisation-latest-vm-vsphere"
  },
  "frequency": "60m",
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  },
  "latest": {
    "unique_key": [
      "vm.keyword"
    ],
    "sort": "@timestamp"
  },
  "description": "Only the latest information from the virtualisation capacity data.",
  "settings": {
    "max_page_search_size": 500
  },
  "retention_policy": {
    "time": {
      "field": "@timestamp",
      "max_age": "30d"
    }
  }
}

sophie_chang · May 26, 2022, 3:11pm

From initial information, the most likely explanation would be that there is a divergence between @timestamp and the time of ingest (however I'm not sure how long a sync.time.delay you experimented with).

As a best practice, the sync time field should be the time of ingest. This is the best way for transforms to be able to identify changes since the last time it checked. This can be set using an ingest processor, something along the lines of:

PUT _ingest/pipeline/set_ingest_time
{
  "description": "Adds ingest timestamps",
  "processors": [
    {
      "set": {
        "field": "_source.@timestamp_ingest",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}

The best way to see if you are getting divergence between @timestamp and ingest time, would be to plot counts for both.

Also there could perhaps be errors relating to writing to the index virtualisation-latest-vm-vsphere in the Elasticsearch logs.

Hope this helps
Sophie

ykara84 · May 27, 2022, 9:02am

Thanks Sophie

Can I have more info on how I can set the Ingest Timestamp for my index-pattern or point me to help/tutorial page? I've never used pipeline processors before

ykara84 · May 27, 2022, 9:50am

I've managed to create the pipeline using Sophies example, but cant see anywhere to tell the pipeline to only apply that ingest timestamp fields to my original index-pattern "virtualisation-vm-*"

how does it know which indexes to apply the pipeline to? does it just apply it to all indexes?

sophie_chang · May 30, 2022, 8:57am

There are a few examples here which explain how to use a pipeline when indexing documents into the *virtualisation-vm-* * index - which is the transform source index. Ingest pipelines | Elasticsearch Guide [8.2] | Elastic

ykara84 · May 30, 2022, 9:38am

Thank you Sophie. I've further experimented with the sync delay value and it seems to be working as expected now

system · June 27, 2022, 9:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform not synching Elasticsearch transforms	9	1960	January 19, 2022
Transform is only partially updated Elasticsearch transforms	10	424	February 28, 2024
Elastic Latest Transform is not working if sync time and sort time is different Kibana transforms	7	188	April 5, 2024
"sync" command in Transform API Elasticsearch transforms	7	816	May 25, 2023
Continuous transform of a transform destination index Elasticsearch transforms	1	144	May 16, 2024

'Latest' Transform job not refreshing all docs in the destination index

Related topics