we are evaluating ES for a production use and trying to prepare some
capacity estimation. During that task I just realized that some terms and
internal mechanics are not clear enough to me and I wasn't able to find a
relevant blog post, SO question which would clarify that for me.
Storage size vs index size
the difference between those is clear, what is not clear where to find
them eg. in head of HQ plugin - There is just primarily and total size
which is clear: Total = Primary size * num of replicas. The reason why I am
asking is that If I understand correctly Index should be ideally kept in
memory to ensure an optimal performance. While storage size is fine when
offloaded to disk. For our project production use we would need 40TB
"Primary size" as of HQ plugin says. If we should keep that in memory,
using 68GB servers we would end up 40TB/68GB machines in the cluster which
would be horrible and there is certainly flaw in my understanding. So
elementary question is: Where I can find Index and storage size on the
REST, plugins, eg.?
OLAP (slices, dices, aggregation, etc) kind of queries
we are required to perform on such data OLAP/ analytics kind of
queries. Correct me If I am wrong but I expect that all fields we are going
to be queried/aggregated has to be Indexed. Or just stored?
You don't need to store the entire index in memory, that's now how ES works.
Regarding the second point, you index a document's fields which allows you
to search it, storing means you can also return the value of the field if
it is found in the search.
we are evaluating ES for a production use and trying to prepare some
capacity estimation. During that task I just realized that some terms and
internal mechanics are not clear enough to me and I wasn't able to find a
relevant blog post, SO question which would clarify that for me.
Storage size vs index size
the difference between those is clear, what is not clear where to
find them eg. in head of HQ plugin - There is just primarily and total size
which is clear: Total = Primary size * num of replicas. The reason why I am
asking is that If I understand correctly Index should be ideally kept in
memory to ensure an optimal performance. While storage size is fine when
offloaded to disk. For our project production use we would need 40TB
"Primary size" as of HQ plugin says. If we should keep that in memory,
using 68GB servers we would end up 40TB/68GB machines in the cluster which
would be horrible and there is certainly flaw in my understanding. So
elementary question is: Where I can find Index and storage size on the
REST, plugins, eg.?
OLAP (slices, dices, aggregation, etc) kind of queries
we are required to perform on such data OLAP/ analytics kind of
queries. Correct me If I am wrong but I expect that all fields we are going
to be queried/aggregated has to be Indexed. Or just stored?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.