Hi all,
We are using ES on AWS EC2 instances and we are want to reduce our JVM usage. The most common solution to this is to use doc values and thus reducing the in memory field data. When we tried it on a test system this is resulted with high I/O consumption and very large storage.
We want to use the doc values in the ephemeral storage but when I looked into it the doc values files are stored with the all the other data.
Is it possible to make a way to store the doc values only on another filesystem?
The file amount wasn't high, the I/O issues were because of read I/O and not write.
This happened when we got about 1 mil documents per minute and then did a query for a few hours through the Kibana. Before the change to doc values the search crashed due to OutOfMemoryError, when we changed it to doc values the query could be after a very long time and the read I/O on the instance was very high in the time of the search.
We thought maybe we could put the doc values files in a SSD disk with high IOPS but as you confirmed it cant be done.
Can you suggest any other way to handle very high amount of data and not hurt the performance while searching?
We don't have the graphs showing the described behavior.
I would like to emphasize that we have few TB of data in a write intensive cluster and the total heap size in the cluster isn't big enough to contain the needed field data cache for searches.
In order to overcome this, we manually deletes every 20 mins the field data cache so it's being generate from scratch each time user perform sort on data.
Putting all of the data on Ephemeral storage isn't an option (will multiple the total cost of the cluster).
Few questions:
Is it lucene restriction or Elasticsearch to have the doc values in a different FS (storage)?
Do you plan to support such feature in the future?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.