Latest unique transform missing documents

We have what I believe to be a straightforward transform to pick out unique documents ordered by an ingest pipeline timestamp. We apply it to several programs (customers) index data, but with larger data it is missing documents. With 5000 or less documents in an index it seems to work fine, but with 23000 documents it seems to be missing about 500 to 1000 documents.

Furthermore, I have noticed that when the timestamps are close, it is not picking the latest one. It picks the earliest one.

Here is the transform:

PUT _transform/phils_test_unique
{
  "source": {
    "index": "axp_marketplace_search_catalog_1_1728534453"
  },
  "dest": {
    "index": "phils_test_unique",
		"pipeline": "axp_marketplace_event_ingested_ingest_pipeline"
  },
	"latest": {
		"unique_key": ["product_pk", "catalog_type"],
		"sort":       "event.ingested"
	},
	"description": "",
	"frequency": "5m",
	"sync": {
		"time": {
			"field": "event.ingested",
			"delay": "60s"
		}
	}
}

Can anyone tell me why the resulting index will be missing documents? It seems to be missing the same amount each time it is run.

Ok, I figured it out. I had to make the input index definition of catalog_type to be of type "keyword", and then I had to remove the keyword suffix from the unique_key in the transform. Now all the documents are in the output index.

2 Likes