Hi,
at ES documentations is says - By default, Elasticsearch uses heuristics in order to automatically trigger flushes as required.
does anyone knows when ES triggers flush to disk and if there is a way to configure it..
we saw that when running a heavy indexing process, and the RAM is getting full, flash to disk happens every few seconds, and that slows our indexing speed..
with Solr you have the ability to control when a this flash runs (called hard commit)
any ideas what can we do to control it?
Elasticsearch allows you to specify the refresh_interval for an index, setting this to -1 is something I do before bulk indexing or reindexing to speed things up:
Hi, we know about refresh_interval and setting it to -1 is the default configuration for us in tests. the scenario I specified above is with this set already.
this affect the
but coming back to my original question, do you know when ES triggers flush to disk and how can we control it?
No, I'm sorry. I only know what the documentation says, that ES uses heuristics to trigger flushes and that the user can trigger this manually in order to reduce the recovery time when restarting nodes.
But are you sure flushing is the cause for the slowness you experience every few seconds? If you've turned off refreshing it could still be the garbage collector kicking in. Perhaps you could check that?
I regularly re-index hundreds of millions of documents and terabytes of data and the Reindex API uses bulk indexing underneath, but I've never experienced the problem you mention. All I ever do is turn off the index refreshing and shard replication, after that it's all smooth sailing.
I hope you can figure it out or that someone else, who knows more about the flush mechanism, can help you.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.