Everything looks good in the logs.
The date_time
field is not the ingest timestamp, it just specifies when an action occurred (there are more fields in the original transform but I'm investigating the problem with less fields - as specified above - just to make things simpler).
I suspected that the problem is because of the 3min
delay
but I ran another transform with 15min
and 20min
delay
and had the same results.
Although, I noticed a strange behavior with this transform: (same source and destination as above but with different names)
{
"id": "test_3",
"source": {
"index": [
"alias_all_hits*"
],
"query": {
"range": {
"date_time": {
"gte": "2020-06-25T08:30:53"
}
}
}
},
"dest": {
"index": "test_3_session"
},
"sync": {
"time": {
"field": "date_time",
"delay": "15m"
}
},
"pivot": {
"group_by": {
"wz_session": {
"terms": {
"field": "wz_session"
}
},
"domain": {
"terms": {
"field": "domain"
}
}
},
"aggregations": {
"sessionPageViewsCount": {
"value_count": {
"field": "page"
}
},
"minDate": {
"min": {
"field": "date_time"
}
}
}
},
"version": "7.7.0",
"create_time": 1593092653859
}
I ran it and after an hour, I checked if I still get the count mismatches using these two queries
the first query gives how many sessions the source index has:
GET alias_all_hits*/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"date_time": {
"gte": "2020-06-25T08:30:53",
"lte": "2020-06-25T08:40:53"
}
}
}
]
}
},
"aggs": {
"sessions": {
"cardinality": {
"field": "wz_session",
"precision_threshold": 40000
}
}
}
}
the second query gives how many sessions the destination index has:
GET test_3_session/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"minDate": {
"gte": "2020-06-25T08:30:53",
"lte": "2020-06-25T08:40:53"
}
}
}
]
}
},
"aggs": {
"sessions": {
"cardinality": {
"field": "wz_session",
"precision_threshold": 40000
}
}
}
}
Now notice that in the second query, the date time in the gte
is the exact same as the transform gte
, and in this case I DO NOT get count mismatches, no matter what is the lte
.
But if I add even one second to the gte
in the second query (and for sure I update the gte
in the first query), I get count mismatches.
I tried to run another exact transform in continuous mode, and changed the gte
with match_all
in the query range by calling the _update transform api, and I got count mismatches.
I will continue to do more tests, and will update you.
BTW, in the logs I see only regular info, debug logs, and most of them are too big queries (partial that I can't run), but anyway if any of these logs can help you understand what is happening I surely can paste them here.