We have what I believe to be a straightforward transform to pick out unique documents ordered by an ingest pipeline timestamp. We apply it to several programs (customers) index data, but with larger data it is missing documents. With 5000 or less documents in an index it seems to work fine, but with 23000 documents it seems to be missing about 500 to 1000 documents.
Furthermore, I have noticed that when the timestamps are close, it is not picking the latest one. It picks the earliest one.
Here is the transform:
PUT _transform/phils_test_unique
{
"source": {
"index": "axp_marketplace_search_catalog_1_1728534453"
},
"dest": {
"index": "phils_test_unique",
"pipeline": "axp_marketplace_event_ingested_ingest_pipeline"
},
"latest": {
"unique_key": ["product_pk", "catalog_type"],
"sort": "event.ingested"
},
"description": "",
"frequency": "5m",
"sync": {
"time": {
"field": "event.ingested",
"delay": "60s"
}
}
}
Can anyone tell me why the resulting index will be missing documents? It seems to be missing the same amount each time it is run.