Shards and Documents

usahitya · May 17, 2016, 7:24am

Hi Community,

My Linux Box have 16GB RAM, and 500GB Hard disk.

Here my doubt is,

How many maximum shards I can allocate for my Index?
How much Data can A shard accommodate ?

As per Documentation,
Default shard number is 5, and A shard can accommodate 2,147,483,519 documents. ( Document size OR Maximum shards per Index Not specified )

thn · May 17, 2016, 11:42am

What you read is correct, a shard can hold roughly 2B documents assuming the indexed document has no nested object type inside.

Regarding the number of shards per index, yes, by default if you don't tell ES, whenever you create a new index, it will create an index with 5 shards and 1 replica. To change this number, you can either change ES configuration or using a template.

Regarding the maximum number shards that you can allocate for your index, you'll need to do your own estimate:

let's say you are planning to index 5K documents, knowing each shard can hold ~2B documents, in theory, you can use one index with 1 shard.
let's say you are planning to index 12B documents, in theory, you can go with 6 shards but in practice, you should use more shards to reduce the number of documents per shard to be less than 2B.

Lastly, with 500GB hard disk, how many documents you can index depends on the size of the documents and how you want ES to handle your data. You can tell ES to index and store data or index but don't store data for every field in the document. Storing data will increase the index size. You'll need to index your data to find out realistically how many documents your linux box can hold.

usahitya · May 18, 2016, 8:39am

I have incoming data of 100 GB / Day... with 5 concurrent users using simple search queries.

we can allocate 3 boxes as Elasticsearch node.
How much cpu and ram those boxes must have to process the search & indexing efficiently??
Some rough estimation in terms of figures for the compute required in any dummy environment according to you would be of great help.

thn · May 18, 2016, 9:54am

Suggest you index this 100GB of data to see the actual index size in your environment.

In your first post, you said the Linux box has 500GB disk, and here you are saying you have 100GB of data coming in per day, if you take my suggestion above, you'll find out the index size, that will help you to understand how much data your Linux box can really hold or how much data your 3-node cluster can hold.

Since I don't know what your data looks like, I suggest to stick with the standard and go from there. At least you have a base to start with, any tuning down the road can be validated against this base to see if it's better or worse.

Topic		Replies	Views
Sharding in ES Elasticsearch	5	355	June 8, 2018
How many documents can one Index/shard hold? Elasticsearch	4	3519	July 26, 2017
How to decide storage size of an index in Elasticsearch Elasticsearch	2	689	June 23, 2021
Elasticsearch Shards Elasticsearch	5	700	August 22, 2017
Number of Shards for an Index Elasticsearch	6	737	April 12, 2017

Shards and Documents

Related topics