Does the index API block responses until the indexing process is through?

I am wondering wether the index API waits to return a response until the whole indexing process is complete.

More specifically: When I send a document to the API for indexing, will ES only check wether the request, json etc. is valid, then send back the response and proceed with the indexing process without making the client wait?

The context of my question is using ES to log content clicks in a web app. My idea is to simply call an indexing service on specific routes. But I am worried that over time as the indices grow the response time on those routes will go up due to the indexing process blocking the response.

I was thinking that I could implement some sort of pipeline the clicked content could be sent to for logging. But I am not sure if that is overcomplicated and unnecessary.

Thanks for reading :slight_smile:

It blocks until we've fsynced the translog on all shard copies. The document might not be visible for search yet.

Generally folks write to disk and use a second process to stream then into es. That can help batching the fsyncs at the cost of some delay. Filebeat is what I used to use for this sort of thing. I'm likely out of date on that one now.

1 Like

Hey Nik, thank your for answer.

So far we never used more than one shard per index and only one node (just a medium scale monolithic web app).

Does this still apply in such a simple use case?

I like the Idea of simply writing everything to some file and then just cleaning up using a cron job :+1:
The visibility would not be a problem since the data is only accessed by backend users for reports now and then.

But for me it's hard to judge wether this is trying to over optimize. Currently we only have about 50.000 log entries coming in each day. Most of the topics I read here talk about GB's or even TB's of data coming in each day.

Edit

@nik9000 I've just done some reading at the translog docs to better understand your answer.
From what I could understand, setting index.translog.durability to async would achieve what I am looking for, at the cost of loosing some log entries in case of a crash. Did I understand this correctly?

What's the reason behind you wanting to do this, is there performance issues you are having?

Thank your for your answer.

The current logging solution is horribly slow when querying data. Also I would like to take advantage of ES awesome Aggregations and learn more about it.

I am trying to understand if I might run into performance issues like described. But I guess I might be over optimizing.

Do you have any feedback on setting index.translog.durability to async in my use case?

Is that Elasticsearch? Or something else?

No, someone implemented it using a rdbms in the past, not anticipating future traffic. And its also not implemented very well.

Ok, I think you're worrying about the wrong thing with this approach to be honest, totally different beasts so to speak.

We've seen tera/petabyte scale clusters with standard translog settings, there's no need to dig into this level.

1 Like

In order to optimize indexing speed, have a look at the guidelines avaialble in the docs. The most important one is probably to index using bulk requests if you are not already doing that.

1 Like

Haha I guess that answers my question. But I learned some things about logging with es, so I consider this a success :grinning_face_with_smiling_eyes:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.