Elastic Cloud showing "Unhealthy" but all zones are "Healthy"

We are running Magneto 2.4 using Elastic hosted on Elastic.co.

Elastic Cloud version 7.17 due to Magento 2.4 requirements.

Recently our Elastic Cloud is showing "Unhealthy" but all zones are "Healthy" - screenshot attached.

Magento 2 logs are showing the occasional error, one every day or two:

[2023-03-10 09:26:05] main.ERROR: Bulk index operation failed 1 times in index magento2_default_tracking_log_session_20220507 for type _doc. Error (unavailable_shards_exception) : [magento2_default_tracking_log_session_20220507][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[magento2_default_tracking_log_session_20220507][0]] containing [index {[magento2_default_tracking_log_session_20220507][_doc][null], source[{"session_id":"null","category_view":[9,32],"visitor_id":["null"],"product_view":[7,481,125,359,363,31,38,48,144,163,197,213,254,260,316],"end_date":"2023-03-10 09:24:44","start_date":"2022-05-07 08:27:25","store_id":1}]}]]. Failed doc ids sample : null. [] []

Can anyone assist with the cause of this error and how it can be resolved? The Health tab in the Elastic Cloud doesn't show any details.

Thank you

Hi @matto

You can open a support ticket...

Also what does it say when you click on View Issues

Hey Stephen,

Thanks for your reply. Nothing is shown in the "View Issues" tab, please see a screenshot below.

I have just opened a support ticket but if you have any advice please let me know.

Looks like you are missing some primary shards... There are various reasons for this...

Perhaps look at this

Thanks Stephen, I have read through that document.

To my understanding the first step may be to resolve the "primary shard is not active Timeout: [1m]" error?

Are you able to point me in the right direction, does this sound like an issue with our Elastic Cloud setup, or Magento setup?

The setup has been working for 12+ months without issue and this started around 2 weeks ago. We're hosting Magento on an AWS EC2 using an AWS RDS database.

Hi @matto

You should run the commands on this page the documents are pretty clear on how to analyze and bring us back the results.

Especially the explain... That That is how you're going to figure this out why your primary shard is missing.

Run those commands and bring back the results...

I cannot say for sure, but I suspect this is not an elastic cloud issue. You're nodes are green and there's plenty of room on the nodes from what I can see.

Also you should open a support ticket.

Hi Stephen,

I have run those commands, please see the attached files.

https://www.dropbox.com/s/8loqe46ol336885/api_output.jpg?dl=0
https://www.dropbox.com/s/3fdxmmioxmij089/_cat_indices%3Fv%26health%3Dred.txt?dl=0
https://www.dropbox.com/s/vb2j4gc2n6d38xo/cat_shards%3Fv.txt?dl=0
https://www.dropbox.com/s/2vfpdn6snqq085a/_cluster_allocation_explain.txt?dl=0

Does this help to explain what the cause of the issue may be?

I have opened a support ticket yesterday and awaiting a response.

Thank you

You need support you have many red indices and many unassigned shards...

I see "node left" so perhaps this is a Node Issue..

This is definitely not good...

"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",

Do you have snapshots?

Support will be your best bet did you open a Sev 1 ... You should

I would also going into Snapshot Lifecycle Management.... And make sure you are keeping snapshots as you may need to restore

Also it looks like that you have set many your indices to 0 replicas which means that you are risk of data loss. If your data is important you should always have a replica... That is the best practice.

1 Like

Hi Stephen,

Thank you for your support. I opened a ticket and the Elastic support were very helpful, and directed me on how to close the red indicies and then restore these from the latest backup.

The cluster looks to be healthy again.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.