"latest" continuous transform destination index does not have all unique key docs consistently/continually

I have the following latest continuous transform running.
The source index has 10 unique key values. The destination index, at various times, will have between 1 and 10 docs in it when I view (and repeatedly refresh) the docs in kibana/discover view.
I want to be able to query the destination index and reliably get the 10 query key results.
I have tried adding and removing the max search size setting to the transform, but to no avail.
I have the transfom re-running/updating the destination at a fast frequency, and during busy times the source index will have new docs each checkpoint.
Presumably, what happens at the time when the destination index docs are being updated, is for some brief period of time, they're unavailable to query. Is this right?
Is there any way to reliably always be able to query 10 docs in the destination index?

Many thanks in advance!

{
  "id": "latest_trades",
  "version": "7.15.2",
  "create_time": 1661353468907,
  "source": {
    "index": [
      "trades*"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "latest_trade"
  },
  "frequency": "5s",
  "sync": {
    "time": {
      "field": "ingest_timestamp",
      "delay": "1s"
    }
  },
  "latest": {
    "unique_key": [
      "instrument.keyword"
    ],
    "sort": "exchange_timestamp"
  },
  "settings": {}
}

here are transform stats:

{
  "count" : 1,
  "transforms" : [
    {
      "id" : "latest_trades",
      "state" : "started",
      "node" : {
        "id" : "FPldpXAJT6-2NPam1fZgTw",
        "name" : "df-elastic-es-data-nodes-2-2",
        "ephemeral_id" : "Ri-GYog2Q5OqhDa8orFsJQ",
        "transport_address" : "[blah]",
        "attributes" : { }
      },
      "stats" : {
        "pages_processed" : 188046,
        "documents_processed" : 105269435,
        "documents_indexed" : 680127,
        "documents_deleted" : 0,
        "trigger_count" : 356856,
        "index_time_in_ms" : 877475,
        "index_total" : 94023,
        "index_failures" : 0,
        "search_time_in_ms" : 992489,
        "search_total" : 188046,
        "search_failures" : 0,
        "processing_time_in_ms" : 31,
        "processing_total" : 188046,
        "delete_time_in_ms" : 0,
        "exponential_avg_checkpoint_duration_ms" : 80.4437196366004,
        "exponential_avg_documents_indexed" : 9.794908937594673,
        "exponential_avg_documents_processed" : 154.311742512512
      },
      "checkpointing" : {
        "last" : {
          "checkpoint" : 94023,
          "timestamp_millis" : 1663163734428,
          "time_upper_bound_millis" : 1663163733428
        },
        "operations_behind" : 296,
        "changes_last_detected_at" : 1663163734421,
        "last_search_time" : 1663163734421
      }
    }
  ]
}

Hi,

some questions about this use case:

There should be always 10 results. It might be that these results aren't recent if the transform isn't able to update them quickly. If you don't get 10 results I rather suspect that you run into a search failure. Did you created the destination index yourself or via transform? Does it have more than 1 shard?

I suggest to run the query via the REST API and check if you are getting search errors.

The other reason you don't see it in discover is the time picker, it internally runs a range query. So you won't get a result if its outside of the range. Can you expand the time and see if that helps? As the destination index has only 10 docs you actually don't need a range query.

Because you use ingest_timestamp for sync but exchange_timestamp for latest, I wonder if you have some out of order problem in your data.

This setting controls the number of search results the transform gets per call. I don't understand the motivation to tweak it for this use case.

You should always get results, there is no unavailability, but you might see an old version if the latest changes haven't been flushed. Transform re-freshes the destination index after every checkpoint. You should at least see the latest document of the last finished checkpoint.

^We definitely do not see all 10 so I'll try your suggestions above to try to resolve.

It is actually an old index (created by an older transform that used scripted metrics prior to the "latest" transform functionality being released by ELK).

Thanks @Hendrik_Muhs !

I have been repeatedly running the query using POST /_transform/_preview and results are always returned without errors.

Sorry, I should have been more precise. With "query via the REST API" I meant search on the destination index:

GET latest_trade/_search

This should always return 10 results.

That makes more sense!

Unfortunately, I didn't get chance to run that on the original transform and destination index.

I have this morning deleted the transform and destination index, and recreated the transform and PUT index and PUT index/_mapping manually (as dynamic mapping didn’t work for me this time for some fields).

The other thing to note, was that I had a pipeline on the transform destination index. Although, I wouldn’t expect this to have had any issue.
I have eliminated that pipeline, by using alias of an existing field whereas in the pipeline it was set’ing a new field based on the existing field value, and I was setting an ingest timestamp when transform'ed docs were indexed in the destination, but have chosen to simply use the existing ingest_timestamp value that existed in the source doc (so I am no longer overwriting it with an updated value).

Running the following is working reliably now. Happy days! I will continue to run and test that through the day, as this issue appeared to occur more later in the afternoon, when our ingest rate increases.

GET latest_trade/_search

Thanks @Hendrik_Muhs I'll post here if I have any further issues.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.