Hi,
I have one node with 1 TB hard disk and 30 GB RAM . We are getting 3 main index.
one index having daily 10 million of records (1 document with 4 fields all are keyword type field)
So how many shards should take for one index and heap size require to change.
Those are numbers which really depend on your usage (ingestion but also search/queries/aggregations), your data, your hardware and only way to answer is by testing...
For JVM Heap :
With 30GB RAM on the one-node cluster, you can go to up to 50% so up to 15Gb... You could start with 4 or 8Gb and then see if this is enough (if you have no issue related to lack of memory). Memory not used by the ES JVM will be used for file system cache so there is no wasted RAM.
Also note that increasing the heap size means garbage collection is slower (depending on number of CPUs you have), during garbage collection the system won't respond so you want to factor this in
For sizing, there is no quick answer and best is to use capacity planing document, and often a good size of primary shards for performance is 30 to 50Gb per shard but it could be lower or higher depending on your ingestion/search SLA and the usage : https://www.elastic.co/guide/en/elasticsearch/guide/current/capacity-planning.html
If one day of data is very small so and you are short on resources (CPU and Disk I/O - especially if your disk is a spinning disk as opposed to SSD - also comment is around having a one node cluster), you might go for weekly/monthly indices so you have fewer shards on your node, or add more nodes to your cluster.
Note number of primary shards cannot be changed without reindexing, but in case of daily indices, you will use a template so you could increase/decrease the value for new indices so you can easily review number of shards later if you have too few with one primary shard...
I am new for configuration for large volume of data and I have to go on production server , so i need configuration information ( We have one node server with 1 TB SSD and 30 GB RAM, First Index -75 million documents per day (with 100 byte per doc), 2nd index- 15 million per day, 3rd index - 100 million per day)
How many shards should i select when create index?
50% so up to 15Gb => means heap size min and max will be 15 gb [default is 2 gb] ???
any other configuration need to change
How many shards should i select when create index?
You should use capacity planning link as this really depends on the amount of data you will have in the index, on the performance of that machine (Disk I/O, CPU, RAM), on the volume and type of queries/aggregations you will run, on the performance you are looking for in ingestion and search. Often an optimum size of primary shard for logging use case is between 30Gb and 50Gb per primary shard.
50% so up to 15Gb => means heap size min and max will be 15 gb [default is 2 gb] ???
Yes min and max should be set to same value. Again 15Gb is a maximum value is a maximum value to use here : https://www.elastic.co/blog/a-heap-of-trouble
Thanks for reply julien,
we have 1 TB SSD , Fore Core and 32 GB RAM.
one index 50 million records per day in that case how many shards should i take and other configuration take care like memory etc.. please help
This is the exercise you have to go through. The number of doc is not directly relevant as the size of the shard will depend on the mapping, terms frequency... So you have to load one day of data and check the shard size (_cat/shards), try to go for one primary shard and then rollover daily or less often to minimise total number of shards on the single-node with only 4 CPU
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.