I am using m4.large.elasticsearch with 2 nodes having 512 GB of EBS Volume.In total of 1TB disk space. I have setup the fielddata cache limit to 40%.
We are continuously experiencing Cluster Index Blocking issue which is preventing further writing operation of new indexes.
I can see the continuous JVM memory pressure is beyond 95%
Hi,
I have flushed the cache few minutes back. Still no improvement.
Its huge data from GET /_cat/shards?v however nothing seems to be alarming from the outputs. We have setup hourly basis indices.
Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.
Or use markdown style like:
```
CODE
```
This is the icon to use if you are not using markdown format:
There's a live preview panel for exactly this reasons.
Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.
You probably have too many shards per node.
May I suggest you look at the following resources about sizing:
Thanks for your suggestions. Is it possible to chat online with you anywhere ? I have few other things to discuss. We have just started using Elasticsearch for centralised log aggregation. Hence I need few other points to clear.
No. It's not. If you are nearby a conference we support or speak at, you can always try to ask questions there.
We do have a support offer where you can ask to discuss with an engineer at elastic. LMK if you want to know more about this and I'll connect you to sales.
For now, feel free to continue asking here on discuss.elastic.co as this is the best place to get information and share also the knowledge with the rest of the community.
Ok thanks David. Let me know if you have any conference scheduled in near future in UK. I would like to attend the same.
Regarding the queries :- My cluster is still not allowing me to write any new indices as it is blocked by Cluster Write Blocked. The only reason I see is the JVM memory pressure is too high around 95%.
Things I am struggling to figure out :
My Cluster has 2 nodes each with 512GB disk space and 8 GB RAM.
My default settings are 5 Primary shards and 1 replica shard.
When I query for a particular index GET _cat/shards/index_name it gives me 10 records with the allocation like 2 primary 2 replica for certain no. of documents and 4p,4r for another set of documents and 3p,3r for another set , 1p,1r for another sets and 0p,0r for another sets.
Why the allocation is like this as described below :
Index_name 2 r STARTED 925 612.3kb x.x.x.x replica node name
Index_name 2 p STARTED 925 612.3kb x.x.x.x primary node name
Index_name 4 r STARTED 910 568.5kb x.x.x.x replica node name
Index_name 4 p STARTED 910 568.5kb x.x.x.x primary node name
Index_name 3 r STARTED 907 615.1kb x.x.x.x replica node name
Index_name 3 p STARTED 907 615.1kb x.x.x.x primary node name
Index_name 1 r STARTED 881 481.7kb x.x.x.x replica node name
Index_name 1 p STARTED 881 481.7kb x.x.x.x primary node name
Index_name 0 r STARTED 920 631.2kb x.x.x.x replica node name
Index_name 0 p STARTED 920 631.2kb x.x.x.x primary node name
I have read somewhere that fielddata put lot of pressure of JVM heap so tried to clear the cache of all fielddata that did not work out.Also the no.of fielddata is not that huge. I have updated the fielddata cahce size to 40%
I am actually stuck now as I am not able to decide what configuration do I need to update so that writing new indexes is enabled for now and also I am looking for a long term solution to update the cluster in production .
Do I need to change the primary shard value ?
And how do you determine the heap size looking at the above index size ?
Any quick check to determine what is participating my JVM memory pressure for which the write operation got blocked ?
Please let me know the settings to get rid of this problem now.
Just wanted to know why the default settings are 5:1 for shards if too many shards is really a memory issue. How does it contributes to memory issue. Any theory behind this ?
Also will there be any problem with the performance if I configure 1 shard per node?
Regarding the index writing blocked issue :- Is there any temporary fix that I can do in my test cluster to enable writing the indexes once again ?
Hi @dadoonet ,
I have setup the cluster with hourly indexes and as mentioned above I have 2 nodes with 512GB space.
When you say 1 shard per index that means if I create 1 primary and 1 replica shard for an index it will create 48 shards a days w.r.t my index pattern.
Now for example If I want to retain the documents for 30days then in 30days time my shards would be 48 * 30=1440 before I housekeep the indexes after 30 days.
Now 1GB of JVM memory allows only 20 shards hence in that case my memory can accomodate upto 80 shards in one node as I have 4GB of heap available in one node. Total of 8GB of memory can accomodate upto 160 shards which is far less than the 1440 shards in 30days time. Have I understood correctly and if yes do I need to upgrade my RAM to accomodate this much shards for 30days or may be more than that if the retention period is 6 months time ?
Do I need to increase the no. of nodes or RAM ? which one would be better ?
I need to suggest something for production as the size of each index is going to ramp up a lot because the users will ramp up in few days . Can you please suggest something ?
Why would you set up and use hourly indices?? Given the retention period you mention it does not see to make sense. How much data are you expecting to index per day?
Basically the hourly index has already been set in production. The retention period is high because we will have lot of dashboards in kibana for various trend analysis.
Currently the daily index is small however in next few weeks the daily index will be around 2 GB per index. So for 24hrs indexing will be 48GB per day.
Therefore in this case I will end up with 48 shards per day(i.e.1440 shards in 30 days) however according to the thumb rule we should have 20 shards per GB memory which can accommodate only 160 shards (4GB heap size on each node).
This is way less than the estimated shards in 30 days.
2GB is quite a small shard do I would recommend you switch to daily indices with 2 primary shards. If you you find it hard to predict data volumes you can use rollover to crease indices of a certain target size rather than have then cover a specific period. This can be managed through ILM.
Thanks a lot @Christian_Dahlqvist . Are these shards incur any costs as I am using Elasticsearch as a service from AWS ?
Or is it just the cost of the node that I need to pay ?
Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, APM, Logs UI, Infra UI and what is coming next ...
One thing I want to ask is that my Production is still running fine with 5 Primary shards and 1 replica shard. The memory pressure is around 70-75%.
What I meant is like I haven't experienced any problem with the memory yet unlike my TestCluster with the same configuration.
The total no. of primary shards has gone upto 5331 which is huge compared to the available heap memory of 4GB on each node.
Do you think I am going to face the same problem in production very soon l like I faced in Test which blocked the index writing?
If the recommendation is 20 shards per GB then it would have blocked the index writing by now with so many shards right ?
Just wanted to understand before I update the cluster settings in production.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.