Hi,
pls help me with ES write performance for real time indexing and how soon the document is available for search. To establish this in my jmeter performance script, after every document index, a fixed delay of 300ms or more is applied before searching for this document but the indexing is not scaling. What is the way to scale up for 2000+ docs per second
By default indexed documents are made searchable every second. This is an expensive operation which can be controlled through the refresh interval setting.
I do the refresh on every document index but the problem is the indexing doesn't scale.
Indexing individual documents rather than using bulk requests results in a lot of overhead and will lead to significantly worse indexing performance and throughput. Refreshes are even more expensive operations, so calling this for every document will add even more overhead. Indexing that way is expected to lead to very bad performance and lot of overhead and will not scale as you basically are basically doing the opposite of these guidelines for optimizing indexing performance.
Indexing this way is likely to cause a lot of small disk I/O, so it may be useful to look at disk utilization, iowait and IOPS to see what that looks like.
Thanks! but the use case that I have is real-time indexing i.e as soon as any updates are happening in the system required to be indexed and should be available for search and aggregate operations. So could you please suggest something or ES is not used for the real-time indexing use cases?
Elasticsearch is not optimised for that use case.
Thanks again. Elastic advocates says same thing? @dadoonet @Christian_Dahlqvist can you confirm this?
You can definitely trust what @Christian_Dahlqvist says. Elasticsearch is a Near Real Time Search engine.
Thanks! But within one second refresh interval, we are not able to get the maximum indexing throughput, is there anything else we can do about this?
Follow the guidance around optimising indexing performance I linked to. The one that often makes the biggest difference is using the bulk API and not index individual documents. Making sure you have fast storage is also very important.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.