Latency in seeing a document

siakc · July 13, 2024, 10:58am

It seems to be a delay between the time a doc is put and the time it is searchable in the Kibana. I wanted to find the reason.

      "indexing": {
        "index_total": 3568228,
        "index_time_in_millis": 1947151,
        "index_current": 0,
        "index_failed": 0,
        "delete_total": 0,
        "delete_time_in_millis": 0,
        "delete_current": 0,
        "noop_update_total": 0,
        "is_throttled": false,
        "throttle_time_in_millis": 0
      }

index_current is zero so it is not indexing anything. If so then were the docs are being held back. I also like to know how much the latency is.

Christian_Dahlqvist · July 13, 2024, 11:50am

Making documents searchable in Elasticsearch is a quite expensive operation, so data is batched up and published periodically. This is described in the docs.

siakc · July 13, 2024, 12:15pm

Sure but I am looking for an indicator in the stats to give a hint on refresh/indexing workload.

In other words if I didn't know I am one hour behind, which indicator could have shown that I was?

Christian_Dahlqvist · July 13, 2024, 2:12pm

The default refresh interval is set to 1 seconds, so that should approximately be the max delay. You should be able to check the index settings to see the refresh interval. This concerns the delay between a document being indexed into Elasticsearch and it becoming available for search. Is this what you are referring to or are you looking at the delay of a full indexing pipeline, e.g. including Filebeat, Logstash etc?

siakc · July 13, 2024, 3:01pm

My refresh interval is 1 sec but I see more than minutes delay.

Christian_Dahlqvist · July 13, 2024, 5:37pm

Go to the Kibana console and create a new document with a specified document ID. Then immediately get the document based on the custom ID you set . This will return the document even if it is not searchable. Then repeatedly run a search query based on the document ID (or some other query based on data in the document that is unique to the document) and see how long it takes until it appears.

stephenb · July 13, 2024, 6:04pm

The numbers in your 1st post indicate the each document is taking ~.5ms to actually index into elastic... so the longest delay you would see by the time a document is ready to be indexed and when it would show up in a search is 1s Refresh + .5ms (so about 1s)

Therefore, you Most likely have a delay in your ingest architecture.
I when the logs is produced, scraped, shipped, transformed and arrives at Elasticsearch

siakc · July 13, 2024, 6:23pm

Delay in injection is what I thought about. But it is direct calls from a Javascript app to the ES API. There is nothing in between except network. Delays I see are in order of an hour.

stephenb · July 13, 2024, 6:25pm

Show us the complete document... are you accounting for timezone?

Christian_Dahlqvist · July 13, 2024, 6:27pm

How are you searching for the expected document?

Are you specific enough so that you can be sure the document is in the returned result set?

Another thing that can I have seen when hour long delays have been reported is timezone issues for timestamps in the document.

siakc · July 13, 2024, 6:31pm

I make a query in the database and make one in Kibana. They won't match for today but for the yesterday they do.
Beside that I make a Kibana query for some time span and count the hits. After a while rerun it and see hit count is increased.

stephenb · July 13, 2024, 6:40pm

That sound a lot like delay in ingest architecture.. .

How Many Docs / Sec etc
How Are you Ingesting?
What is in the middle?

Most likely this is not on the Elasticsearch Side...

siakc · July 14, 2024, 5:58am

Docs/Sec is a lot. More than 1000 in a minute and they are nested JSONs. There is a JS client that calls ES API. To be precise a library (bunyan-elasticsearch) is calling ES JS client. But they do so in realtime. There is no buffering/processing in the middle.

DavidTurner · July 14, 2024, 6:58am

1000/minute is not massive, but it's enough that you should be using the bulk API. I looked at the bunyan-elasticsearch code and I think it's not doing so. This doesn't directly impact document visibility, but it might mean that you are building up a large client-side backlog of indexing which would explain a delay. The bunyan-elasticsearch API supports a callback which is called when the write completes. Are you using this callback to check that writes are happening as fast as you think?

siakc · July 14, 2024, 1:24pm

Don't think that callback is assignable by me. The only place logs may get piled up is stream buffer. But this also would happen if the ES API respond slowly.

Christian_Dahlqvist · July 14, 2024, 8:26pm

You can run the test I outlined earlier where you bypass your client to test how long it takes for Elasticsearch to respond. Run it from Kibana or a separate script/process. If that is fast the issue is most likely buffering in the client.

DavidTurner · July 14, 2024, 8:40pm

Exactly. You have to rule this out.

RainTown · July 16, 2024, 5:18pm

curious if any answers/explanations found here. This is sort of issue that often gets chucked my way. "more than minutes delay" often means a lack of understanding, rather than an actual bug.

Another idea (tho take all ideas under advisement) is too say inject the same dummy document (only difference is the timestamp and _id of course) into exact same index every X seconds, even X=1, using a completely different client, maybe directly with a PUT or POST call in a simple shell script. Log the precise time of the acknowledgement received, checking it is a success of course.

Then in kibana search for that document, you should see one per second with a short indexing delay. if you do, the "problem" is elsewhere. If you dont, then there's some other issue with your elasticsearch/kibana setup.

We had lots of people confused that say a log entry appears in a /var/log/blabla.log file, and only appears in kibana say 30-60 seconds later. Yes, it went through Logstash/kinesis/whatever pipelines during that 30-60 seconds - it's not a bug.

RainTown · July 16, 2024, 8:47pm

% cat silly.sh
#!/bin/sh
EUS="xxx"
EPW="yyy"
EHOST="localhost"
EPORT="9200"
day=`date +"%Y-%m-%d"`
timestamp=`date +"%Y-%m-%d %T"`
value1=$(echo $RANDOM | md5)
curl \
    -s -k -u "${EUS}":"${EPW}" -XPOST \
    "https://${EHOST}:${EPORT}/dummy-"${day}"/_doc" \
    -H 'Content-Type: application/json' \
    -d "{\"timestamp\":\"$timestamp\",\"key1\": \"$value1\"}"

is such a silly little shell script, assuming elasticsearch is running locally, adjust accordingly.

I run it via something like:

watch -n 1 ./silly.sh

to run every second.

In my kibana the "delay" between the indexing command returning and result appearing in kibana search is hard to measure its so short, around 1s seems about right.

siakc · July 17, 2024, 8:24am

I can immediately see the doc I insert in one of the indices. For some reason I can't see it in Discover tab in Kibana. Is there a buffer of something between Kibana and ES?

Topic		Replies	Views
Elasticsearch Index Latency Rate - API Elasticsearch	2	1210	December 31, 2019
Search performance when indexing new documents Elasticsearch	1	311	April 1, 2020
Long delay between indexing a document and its availability in search results Elasticsearch	16	7223	November 17, 2017
Huge 4 minute delay between adding a document and it showing up in Kibana Elasticsearch	5	2329	May 30, 2017
Search latency & index latency for elasticsearch Elasticsearch	2	6085	November 5, 2018

Latency in seeing a document

Related topics