Occasionally missing data when making both gt + lte queries with a single isoformatted date

Use case:
I'm trying to build a realtime logging client that allows the user to search old logs as well as see new logs come in realtime. So far everything mostly functions as expected except when the first queries are made.

To get the first set of "old" logs (the latest 100 logs up to that point) I make a range query like this

{
    "sort": [
            {
                "@timestamp": {
                    "order": "desc",
                    "format": "strict_date_optional_time_nanos",
                    "numeric_type": "date_nanos",
                }
            },
        ],
    "_source": ["@timestamp", "message"],
    "from": from_index, # 0
    "size": size, # 100
    "query": {
        "bool": {
            "must": [
                ...,
                "range": {
                    "@timestamp": {
                        "lte": starting_time,
                    }
                }
            ]
        },
    },
}

For my realtime logging, I poll the endpoint with the above query after modification to the range field (shown below) and update starting_time with the timestamp of the latest new log.

"range": {
    "@timestamp": {
        "gt": starting_time,
    }
}

In the beginning the starting_time for both "old" and "new realtime logs" should be the same indicating that it should get all available logs and starting_time won't ever update in the realtime logging query unless a new log is actually retrieved.

The problem:
On the occasion the above queries are first made (the user enters the page and the first old set of logs + realtime polling logs are retrieved), the realtime polling logs occasionally skip an entry or 2 as show with an example below.

2023-01-17T01:24:52.250619185Z: INFO:root:sleeping for 1 second at 186
2023-01-17T01:24:53.251800193Z: INFO:root:sleeping for 1 second at 187
# Above is the first set of "old logs" before time "2023-01-17T01:24:55.723Z"
------------------------------------------------------------------------------------------
# Below is the realtime logs after time "2023-01-17T01:24:55.723Z"
2023-01-17T01:24:56.255059829Z: INFO:root:sleeping for 1 second at 190
2023-01-17T01:24:57.256236468Z: INFO:root:sleeping for 1 second at 191

I thought it might be a race condition with the insert/retrieve of the logs after the start_time but I also attempted to wait a few seconds before retrieving the first log after start_time and the behavior is still the same.

I confirmed that the dates that were missing exist (i refresh the page and they appear in the set of "old logs")

@warkolm thank you for making the tag correction! :slightly_smiling_face:

Have you taken account of the refresh interval?

I have not. I'll give this a shot.

But given that it defaults to 1 second, if I make my process wait 2 seconds before attempting the query in the beginning, it should already have refreshed according to this description. I might be missing some understanding on my side so I'll try this out regardless.

I've the refresh_interval to 1s and it still doesn't seem to fix it.

Additional notes:
the start_time for the realtime-log query doesn't change unless the query came back with a success log because start_time gets updated as so

current_starting_time = start_time
while True:
   logs = make_query(..., start_time=current_starting_timestamp)
   
   has_new_logs = logs.get("hits", {}).get("hits") and len(logs["hits"]["hits"]) > 0
   
   if has_new_logs:
            # Update the current starting timestamp for next query (already sorted)
            current_starting_timestamp = logs["hits"]["hits"][0]["_source"]["@timestamp"]
   
   await asyncio.sleep(2)

from this, we know that when it makes the query again, it still misses first log after the initial start_time

My thoughts on this is that perhaps the time ranges when doing comparison operations gt and lte fail due to some floating point error in the calculation which led me to using date-nanos (although that didn't solve it either).

Date math is not floating point it is integer based.
All dates are stored as UTC milliseconds (or nanoseconds) from the Epoch and so range comparison is integer based, with no floating point... a record either IS or IS NOT in the range.

Date Ranges filters, searchability, and index refresh are core functionality of Elasticsearch so highly improbable that is is a bug... possible but not likely.

More likely there is something else going on with the testing methodology, the logs are not getting ingested when you think, some assumption etc, I am not completely following your test case/test harness....

Exactly what date are you comparing...
The date in your logs may not be the actual @timestamp
I am not sure what your query looks like, or what date is being used in the range query?
Is there are ingest pipeline setting the @timestamp, how is the @timestamp being set?

Keep looking...

Thanks for the descriptive reply.

I believe the ingest pipeline is handled with the default configuration to fluent-bit. Time_Key is default to @timestamp and I enabled Time_Key_Nanos to support time in nanoseconds.

One thing to note is that the start_time date string comes from a isoformatted version of a javascript Date object which is passed in during the start of the queries. Not sure if this makes some weird interactions (although I don't think it is as specific down to nanoseconds so some rounding might be occuring?)

const currentTimeIso = new Date().toISOString();

In meantime, I'll try to get a simplified version of the logic.

Ohh We never welcomed you to the community.. so welcome!

And Good hunting...

1 Like

I believe I found the solution for this and it was pretty simple.

range also happened to have a relation option which defaulted to intersects. After changing it to within, I don't seem to see the problem anymore.

However, I'm not quite understanding the difference between intersects and within. With a single point of time, if it is within a time range, shouldn't it "intersect"?

Conclusion: this seems to solve it. Example below for future reference.

"range": {
    "@timestamp": {
        "gt": starting_time,
        "relation": "within",
    }
}
and
"range": {
    "@timestamp": {
        "lte": starting_time,
        "relation": "within"
    }
}

[EDIT]: Not the solution. I still occasionally see it but this somehow helped a little