Capacity Planning Guidelines? (estimating index size)

Schnyder · October 20, 2011, 2:57pm

We're kicking off a project that will involve indexing terrabytes of data. We're considering using ElasticSearch for the job. However I need to determine the hardware requirements to hold such a large index.

Are there any guidelines to help estimate the size of an index relative to the size of the source data? For instance, if index 100MB of new JSON data, how much can I expect ElasticSearch's index to grow as a result?

Any advice would be GREATLY appreciated.

Thanks,
Chris

Karussell1 · October 20, 2011, 8:27pm

The hardware requirements also depend on what you want to do with it
e.g. how much traffic?

As a rule of thumb I would say that a lucene index is a bit smaller
than the actual data. BUT it really depends on what things of the data
should be indexed or if there are stored field, if you use the _all
field or the _sources etc. I would suggest to setup a test index of
those 100MB and see it in real life.

Also: if you are about to index the things into one index it will get
slower and slower, so maybe you setup some index rolling mechanism (or
play with the shard count) - especially if this is none-static data.

On 20 Okt., 16:57, Schnyder chris.schny...@cardinal-holdings.com
wrote:

We're kicking off a project that will involve indexing terrabytes of data.
We're considering using Elasticsearch for the job. However I need to
determine the hardware requirements to hold such a large index.

Are there any guidelines to help estimate the size of an index relative to
the size of the source data? For instance, if index 100MB of new JSON data,
how much can I expect Elasticsearch's index to grow as a result?

Any advice would be GREATLY appreciated.

Thanks,
Chris

otisg · October 20, 2011, 11:28pm

Hi,

While this won't answer all your questions directly (there is no exact
answer without knowing all details and, really, without doing some
tests), have a look at the disk & memory size estimator for Lucene/
Solr - http://search-lucene.com/?q=size+estimator&fc_project=Lucene&fc_project=Solr
. Parts of this will be applicable to Elasticsearch, but of course
even this estimator is not perfect.

Otis

Check out Search Analytics SaaS - Cloud Monitoring Tools & Services | Sematext

On Oct 20, 10:57 am, Schnyder chris.schny...@cardinal-holdings.com
wrote:

We're kicking off a project that will involve indexing terrabytes of data.
We're considering using Elasticsearch for the job. However I need to
determine the hardware requirements to hold such a large index.

Are there any guidelines to help estimate the size of an index relative to
the size of the source data? For instance, if index 100MB of new JSON data,
how much can I expect Elasticsearch's index to grow as a result?

Any advice would be GREATLY appreciated.

Thanks,
Chris

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Capacity-Planning-Gui...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

kimchy · October 21, 2011, 1:17am

You will have to do some capacity tests with a smaller set of data. Some
points to think about:

By default, _source is stored (the actual json you added). It usually
make sense to turn on compression for it.
_all is by default enabled, meaning that on top of all the specific
fields being indexed, another field which aggregates all of them is also
indexed. It makes searching much simpler, but does add an overhead. You can
completely disable it or pick and choose in the mappings if fields should be
included in all or not.
You might not need to index all the json fields, if you have some that
you don't need to index, you can map those with index set to no.

-shay.banon

On Thu, Oct 20, 2011 at 4:57 PM, Schnyder <
chris.schnyder@cardinal-holdings.com> wrote:

We're kicking off a project that will involve indexing terrabytes of data.
We're considering using Elasticsearch for the job. However I need to
determine the hardware requirements to hold such a large index.

Are there any guidelines to help estimate the size of an index relative to
the size of the source data? For instance, if index 100MB of new JSON
data,
how much can I expect Elasticsearch's index to grow as a result?

Any advice would be GREATLY appreciated.

Thanks,
Chris

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Capacity-Planning-Guidelines-estimating-index-size-tp3437936p3437936.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Topic		Replies	Views
Formula or guidelines to calculate/estimate the index size Elasticsearch	2	994	July 6, 2017
Lucene vs elasticsearch file size Elasticsearch	5	397	July 6, 2017
Questions relating to elastic search Elasticsearch	3	964	July 6, 2017
Elasticsearch index size check Elasticsearch	5	510	July 6, 2017
What will be the size of index, if the data size is 1 TB? Elasticsearch	3	1166	June 23, 2017

Capacity Planning Guidelines? (estimating index size)

Otis

Related topics