I have to procure an elasticsearch cluster, part of it is to specify how much volume we would require in GB and how much the size can grow everyday.
Is there any way I can calculate the volume needed? My use case is, I have to store 5,000 documents in an index initially, and perform edgengram search on one text field and completion suggester on another text field in each document for all 5,000 documents. This I believe will take extra space, because the tokens (each letter) out of edgengram tokenizer for one field will have to be indexed somewhere. The number of overall documents may grow from 1000 to 5000 per day.
Is there any tool to find the space needed to allot in a cluster? Or can you guide me how to determine the space required?
5000 documents a day for 365 days is still below 2 million documents, which is quite small from an Elasticsearch perspective unless the documents are huge. My guess would be a few GB at most but you can find out by simply indexing a reasonable set of documents (maybe a months worth) and extrapolate from there.
Ah thanks @Christian_Dahlqvist .. I have a small problem there. Right now I do not have enough data in the index to pry information from the stats API, if you were suggesting that. The index currently takes data from test environment, there I don't have all 5000 documents already stored, so can't get a figure. It would be tedious to insert each document manually.
But a follow-up question, in addition to the original one - does the index stats API take into account the storage needed for all the edgengram tokens. For example, if there are 5 documents in the index, and the stats API returns a size of 0.0001 MB, does this also include the space taken for storing the edgengram tokens?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.