I assume you mean query performance when you mention "great read
performance"
It depends on the queries. Several factors are to consider:
data characteristics (number of fields, number of terms, number of docs,
immutable/mutable data, change rate,...)
query types/complexities (simple term queries, boolean queries, filters,
caching, aggregations,...)
query load (how many searches must be performed in a given time interval)
Regarding an index size as large as TB, I think it would be best to put a
logical ordering on it and split the index into many indices, e.g. into
timeseries indices or into user id based indices, and combine indices with
an index alias to a logical unit. Then sliding window techniques can be
used to focus the search only on indices which are relevant for expected
responses. This allows to control hot spots in your data, and it can save
enormous amounts of resources for replica shards.
Hi Jörg, thank you for your answer but you did not answer my question.
Is there a general rule of thumb for RAM sizing a machine when I expect an fixed index size for great query performance?
And also is there a upper size where more RAM does not make more sense?
Best regards
Steffen
No, there is no general rule of thumb, since there is no direct correlation
between a fixed index size and the demand for RAM.
There is no upper size limit for RAM imposed by Elasticsearch. JVM
implementations imposes some restrictions, for heaps > 4g, garbage
collecting the heap objects becomes challenging. This is true for all Java
server applications.
Hi Jörg, thank you for your answer but you did not answer my question.
Is there a general rule of thumb for RAM sizing a machine when I expect an
fixed index size for great query performance?
And also is there a upper size where more RAM does not make more sense?
Best regards
Steffen
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.