Elastic Cloud showing "Unhealthy" but all zones are "Healthy"

matto · March 11, 2023, 3:13am

We are running Magneto 2.4 using Elastic hosted on Elastic.co.

Elastic Cloud version 7.17 due to Magento 2.4 requirements.

Recently our Elastic Cloud is showing "Unhealthy" but all zones are "Healthy" - screenshot attached.

Magento 2 logs are showing the occasional error, one every day or two:

[2023-03-10 09:26:05] main.ERROR: Bulk index operation failed 1 times in index magento2_default_tracking_log_session_20220507 for type _doc. Error (unavailable_shards_exception) : [magento2_default_tracking_log_session_20220507][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[magento2_default_tracking_log_session_20220507][0]] containing [index {[magento2_default_tracking_log_session_20220507][_doc][null], source[{"session_id":"null","category_view":[9,32],"visitor_id":["null"],"product_view":[7,481,125,359,363,31,38,48,144,163,197,213,254,260,316],"end_date":"2023-03-10 09:24:44","start_date":"2022-05-07 08:27:25","store_id":1}]}]]. Failed doc ids sample : null. [] []

Can anyone assist with the cause of this error and how it can be resolved? The Health tab in the Elastic Cloud doesn't show any details.

Thank you

stephenb · March 11, 2023, 4:46am

Hi @matto

You can open a support ticket...

Also what does it say when you click on View Issues

matto · March 11, 2023, 5:08am

Hey Stephen,

Thanks for your reply. Nothing is shown in the "View Issues" tab, please see a screenshot below.

I have just opened a support ticket but if you have any advice please let me know.

stephenb · March 11, 2023, 5:24am

Looks like you are missing some primary shards... There are various reasons for this...

Perhaps look at this

matto · March 12, 2023, 2:16am

Thanks Stephen, I have read through that document.

To my understanding the first step may be to resolve the "primary shard is not active Timeout: [1m]" error?

Are you able to point me in the right direction, does this sound like an issue with our Elastic Cloud setup, or Magento setup?

The setup has been working for 12+ months without issue and this started around 2 weeks ago. We're hosting Magento on an AWS EC2 using an AWS RDS database.

stephenb · March 12, 2023, 3:02am

Hi @matto

You should run the commands on this page the documents are pretty clear on how to analyze and bring us back the results.

Especially the explain... That That is how you're going to figure this out why your primary shard is missing.

Run those commands and bring back the results...

I cannot say for sure, but I suspect this is not an elastic cloud issue. You're nodes are green and there's plenty of room on the nodes from what I can see.

Also you should open a support ticket.

matto · March 12, 2023, 3:28am

Hi Stephen,

I have run those commands, please see the attached files.

https://www.dropbox.com/s/8loqe46ol336885/api_output.jpg?dl=0
https://www.dropbox.com/s/3fdxmmioxmij089/_cat_indices%3Fv%26health%3Dred.txt?dl=0
https://www.dropbox.com/s/vb2j4gc2n6d38xo/cat_shards%3Fv.txt?dl=0
https://www.dropbox.com/s/2vfpdn6snqq085a/_cluster_allocation_explain.txt?dl=0

Does this help to explain what the cause of the issue may be?

I have opened a support ticket yesterday and awaiting a response.

Thank you

stephenb · March 12, 2023, 4:20am

You need support you have many red indices and many unassigned shards...

I see "node left" so perhaps this is a Node Issue..

This is definitely not good...

"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",

Do you have snapshots?

Support will be your best bet did you open a Sev 1 ... You should

I would also going into Snapshot Lifecycle Management.... And make sure you are keeping snapshots as you may need to restore

Also it looks like that you have set many your indices to 0 replicas which means that you are risk of data loss. If your data is important you should always have a replica... That is the best practice.

matto · March 13, 2023, 1:26pm

Hi Stephen,

Thank you for your support. I opened a ticket and the Elastic support were very helpful, and directed me on how to close the red indicies and then restore these from the latest backup.

The cluster looks to be healthy again.

system · April 10, 2023, 1:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unavailable_shards_exception primary shard is not active Logstash	2	3691	June 5, 2018
Elasticsearch error Elasticsearch	2	486	November 23, 2020
Preventing cluster Health (status) from being RED Elasticsearch	3	1251	July 5, 2017
Cluster green but primary shard errors Elasticsearch	1	582	June 27, 2017
Unavailable_shards_exception reason=> filebeat primary shard is not active Logstash	4	11113	October 4, 2017

Elastic Cloud showing "Unhealthy" but all zones are "Healthy"

Related topics