I have a problem with my index transformation and already tried so many different versions but nothing helped:
My documents in the source index look like:
{
"content" : {
"creationTime" : "2022-07-25 16:00:49 +02:00",
"orderId" : "552313",
"state" : "SUCCESS",
"category" : "catA"
},
"latestUpdate" : "2022-11-29 18:42:08 +01:00"
}
}
I want to use transforms to put the orders into time-buckets of 1minute (I originally wanted to use 30minutes, but I noticed that only 1 minute or 1 hour is supported but nothing in between, right?) and filter out orders with the state ERROR.
Every time, the data gets updated, the timestamp "latestUpdate" also changes to the current time.
My source index is updated every 10 minutes and overwrites some documents of the last week, since the status-field-entry changes.
My transform request looks like this:
PUT _transform/transform_my_index001
{
"source" : {
"index" : "my_index",
"query": {
"bool": {
"must_not":[
{"term":{"content.state":"ERROR"}}
]
}
}
}
},
"pivot": {
"group_by": {
"creation_time": {
"date_histogram": {
"field": "content.creationTime",
"calendar_interval": "1m",
"time_zone": "Europe/Berlin"
}
},
"portal": {
"terms": {
"field": "content.portal"
}
}
"category": {
"terms": {
"field": "content.category"
}
}
},
"aggregations": {
"order_count" : {
"value_count": {
"field": "content.state"
}
}
}
},
"description": "Transform pipeline to put orders into a new index.",
"dest": {
"index": "aggregated_orders001"
},
"settings" : {
"align_checkpoints" : false
},
"frequency": "5m",
"sync": {
"time": {
"field": "latestUpdate",
"delay": "60s"
}
}
}
The transform works for both old documents and new incoming documents but not for the updates. So if the state of one document changes to ERROR, the document is still being counted in the transformed index. I thought that this would be the perfect case for using the continous mode with my 'latestUpdate' field but apparently, I do something wrong... I found this "align_checkpoints" setting and hoped that this would help, but it still doesn't update the counts correctly. Do I maybe need to tell Elastic the time-format of the latestUpdate field, or the time zone?
Also: is there a better way to count the documents in the groups than choosing some field like content.state and do a value_count?
Any help would be very welcome and appreciated.