The hardware requirements also depend on what you want to do with it
e.g. how much traffic?
As a rule of thumb I would say that a lucene index is a bit smaller
than the actual data. BUT it really depends on what things of the data
should be indexed or if there are stored field, if you use the _all
field or the _sources etc. I would suggest to setup a test index of
those 100MB and see it in real life.
Also: if you are about to index the things into one index it will get
slower and slower, so maybe you setup some index rolling mechanism (or
play with the shard count) - especially if this is none-static data.
On 20 Okt., 16:57, Schnyder chris.schny...@cardinal-holdings.com
We're kicking off a project that will involve indexing terrabytes of data.
We're considering using ElasticSearch for the job. However I need to
determine the hardware requirements to hold such a large index.
Are there any guidelines to help estimate the size of an index relative to
the size of the source data? For instance, if index 100MB of new JSON data,
how much can I expect ElasticSearch's index to grow as a result?
Any advice would be GREATLY appreciated.