Latency in seeing a document

No, there is not. The Discover tab by default applies a timestamp filter. Is the document you inserted within this (accounting for timezone differences)?

A GET by ID is realtime.
A Search is not.

You need to refresh or wait for the refresh to see your documents within a search request. Which is what discover is running.

1 Like

I have
"refresh_interval": "1s"
But it is about an hour and can't see a few docs I inserted.

Could you share a screenshot please?

GET /_search
{
  "query": {
    "ids" : {
      "values" : ["g12"]
    }
  }
}
{
  "took" : 317,
  "timed_out" : false,
  "_shards" : {
    "total" : 244,
    "successful" : 244,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "myindex-v2-2024.07.17",
        "_type" : "logs",
        "_id" : "g12",
        "_score" : 1.0,
        "_source" : {
          "log" : {
            "sof" : 334,
            "dd" : {
              "as" : "asdfjhg aslkf gaskf g",
              "l" : 3939
            }
          }
        }
      }
    ]
  }
}

I can't see the index name but this document is only available in index myindex-v2-2024.07.17.

If you need more help, please share the data view configuration.

Pattern is myindex*

What is data view configuration?

Note that the query you are running on Kibana _id:g12 is not the same query as you are running within dev tools.

Could you share a screen capture of the index pattern configuration?

Also what is your version?

Version is 7.17. Not sure where index pattern configuration is. There is Kibana>Index Patterns which have myindex* in it.

Your test document does not have the @timestamp field set and since your index pattern is created for data having this field present and apply a filter on it (top right corner) this test document will never show up in the results.

1 Like

Yes. This is the screen capture I wanted to see.

... and I fear we are now on a different issue than the one first reported.

IF the original issue was with documents missing a timestamp field, they'd NEVER appear in a kibana search on the relevant index, never mind after a few seconds/minutes/hours.

siakc - Data Views were introduced in v8. v7.17 is 2+ years old, not itself an issue.

But as I wrote above this thread has all the hallmarks of a lack of understanding issue. Anyways, any fake / dummy document needs a timestamp, on assumption your real data also has a timestamp - see my silly insert-a-doc script above, be careful about formats.

PUT /myindex-v2-2024.07.21/_create/g2210
{
  "timestamp": "2024-07-21T06:13:14.114Z",
  "log": { "timestamp": "2024-07-21T06:13:14.114Z", "sof": 334, "dd": { "as": "asdfjhg aslkf gaskf g", "l": 3939} }
 
}

Did this but still can't see the doc in search. Also used @timestamp. Sorry where Data Views did come from?

Of course this is not what the thread is about. Production logs all have timestamp.

You've set fields called timestamp, not @timestamp. The difference is important.

I meant I have set @timestamp also.

Rather than monitor by inserting dummy documents, which will pollute the index, I generally prefer adding an ingest timestamp through an ingest pipeline to each document. The ingest timestamp will reflect when Elasticsearch processed the document and be in UTC based on server time. With this I can run a query or create a visualisation showing how data comes in over time. It also allows me to calculate and visualize ingest delay by calculating the difference between the original timestamp and the ingest timestamp and also break it down by source, log type etc.

Sure, an ingest timestamp is a good idea generally, but its pretty clear to me that here specifically we have a lack of understanding from @siakc around some basic concepts (sorry if that's harsh, but read the whole thread and tell me I'm wrong)..

Like I said in an earlier post, we're now more at the "introduction to elasticsearch" sort of level, rather than troubleshooting the original delay/latency, which I think we agree is unlikely to be at the elasticsearch part of this overall "solution".

My advice to @siakc would be dependent on the business value here - if this is a critical part of his company's operations/solution, he likely needs to start looking at supported solutions, so he can ask paid-for-engineering-teams for help going forward, and/or up-skilling the internal support team(s).

1 Like

I should have applied timezone shift in my query apparently. So now I can see that the inserted log is immediately searchable.

I still don't know why there is a delay. I guess I have my answer to my original question. There is no delay in searchability. Though I did my test in a test environment. May also do it in production.

I do not think I have ever seen an issue where indexed documents take a long time to show up as searchable unless the refresh interval has been incorrectly set, so was expecting this simple test to show that the issue likely is client side.

I have seen isues where users have tried to index documents individually at high throughput rates instead of in bulk and that the overhead associated with this has prevented Elasticsearch from keeping up. This as far as I remember often shows as failed indexing requests to the client so should be noticable.

@stephenb wrote 8 days ago ":Show us the complete document."

which would have helped, as it would have shown the timestamp (maybe @timestamp) field.

".. are you accounting for timezone?"

I understand "no" is the answer.

But

@siakc

I should have applied timezone shift in my query apparently. ... I still don't know why there is a delay ... There is no delay in searchability

Not really following that, there is and likely never was any delay.

Kibana has a setting for which timezone queries are related to, e.g. UTC, Europe/Berlin, whatever. So, e.g., "last 15 minutes" is translated into 2 timestamps, "15 minutes ago" and "now" correctly.

Now elasticsearch is clever enough to convert things on the fly, on assumption it's supplied with a timestamp including a timezone for the timestamp field of each document, usually by using a standard format like say ISO8601. e.g

2024-07-22T00:39:36+02:00
2024-07-21T22:39:36Z

which both represent the same actual time.

But if any pipelines are ingesting with ONLY timestamps (i.e. without timezone, or with inconsistent timezones) then you may get problems

e.g. the 2 timestamps above are identical, but

2024-07-22T00:39:36+02:00 (Europe/Berlin time)
2024-07-21T22:39:36Z (means UTC)
2024-07-21T22:39:36

These 3 are potentially not identical as the third one is ONLY the same actual time in Europe/Berlin timezones, anywhere else on the world that represents a different actual time.