Does the index API block responses until the indexing process is through?

Courtan · May 16, 2021, 8:34pm

I am wondering wether the index API waits to return a response until the whole indexing process is complete.

More specifically: When I send a document to the API for indexing, will ES only check wether the request, json etc. is valid, then send back the response and proceed with the indexing process without making the client wait?

The context of my question is using ES to log content clicks in a web app. My idea is to simply call an indexing service on specific routes. But I am worried that over time as the indices grow the response time on those routes will go up due to the indexing process blocking the response.

I was thinking that I could implement some sort of pipeline the clicked content could be sent to for logging. But I am not sure if that is overcomplicated and unnecessary.

Thanks for reading

nik9000 · May 16, 2021, 8:59pm

It blocks until we've fsynced the translog on all shard copies. The document might not be visible for search yet.

Generally folks write to disk and use a second process to stream then into es. That can help batching the fsyncs at the cost of some delay. Filebeat is what I used to use for this sort of thing. I'm likely out of date on that one now.

Courtan · May 16, 2021, 9:51pm

Hey Nik, thank your for answer.

So far we never used more than one shard per index and only one node (just a medium scale monolithic web app).

Does this still apply in such a simple use case?

I like the Idea of simply writing everything to some file and then just cleaning up using a cron job
The visibility would not be a problem since the data is only accessed by backend users for reports now and then.

But for me it's hard to judge wether this is trying to over optimize. Currently we only have about 50.000 log entries coming in each day. Most of the topics I read here talk about GB's or even TB's of data coming in each day.

Edit

@nik9000 I've just done some reading at the translog docs to better understand your answer.
From what I could understand, setting index.translog.durability to async would achieve what I am looking for, at the cost of loosing some log entries in case of a crash. Did I understand this correctly?

warkolm · May 17, 2021, 12:19am

What's the reason behind you wanting to do this, is there performance issues you are having?

Courtan · May 17, 2021, 6:36am

Thank your for your answer.

The current logging solution is horribly slow when querying data. Also I would like to take advantage of ES awesome Aggregations and learn more about it.

I am trying to understand if I might run into performance issues like described. But I guess I might be over optimizing.

Do you have any feedback on setting index.translog.durability to async in my use case?

warkolm · May 17, 2021, 6:38am

Is that Elasticsearch? Or something else?

Courtan · May 17, 2021, 6:40am

No, someone implemented it using a rdbms in the past, not anticipating future traffic. And its also not implemented very well.

warkolm · May 17, 2021, 6:44am

Ok, I think you're worrying about the wrong thing with this approach to be honest, totally different beasts so to speak.

We've seen tera/petabyte scale clusters with standard translog settings, there's no need to dig into this level.

Christian_Dahlqvist · May 17, 2021, 6:53am

In order to optimize indexing speed, have a look at the guidelines avaialble in the docs. The most important one is probably to index using bulk requests if you are not already doing that.

Courtan · May 17, 2021, 6:54am

Haha I guess that answers my question. But I learned some things about logging with es, so I consider this a success

system · June 14, 2021, 6:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Insert later feature Elasticsearch	11	366	July 6, 2017
Best Translog configuration for ES 2.2.0 Elasticsearch	9	1718	July 5, 2017
Any way to not sync translog to disk continually? Elasticsearch	2	286	July 6, 2017
Details of how transaction log is managed during indexing Elasticsearch	7	2915	July 6, 2017
How to handle error and retry/recovery Elasticsearch	6	4508	July 6, 2017

Does the index API block responses until the indexing process is through?

Related topics