Cloud deployment constantly going unhealthy

Hi everyone, I have a deployment which today went unhealthy and after a full restart it works for a few minutes and then goes unhealthy again. From what I can see I haven't filled my storage capacity and I haven't increased the load at all recently, I haven't logged in for weeks or changed the config. It's been working away in the background for months and suddenly it's broken.

I have no idea what to do. I have tried to upgrade my support so I can get a fast response but I get the error "User xxxx is not a paying user. Add billing information to change subscription." when I try to do that.

Does anyone have any suggestion as to what I can try to get me out of this hole?

Thanks!

Welcome to our community! :smiley:

Can you elaborate a little more on this and what you are seeing?

I can't really tell you much, when I log into my cloud account it just shows the deployment as unhealthy. There's very little info, or rather very little that I know to look for.

If I do a full restart it works again for a while but then goes back to unhealthy.

Ok. Can you access Kibana when it says it's unhealthy?

If I have kibana loaded and try to refresh it gives me a popup error in the bottom corner. If I reload the whole page I just get a 503 error in some json.

{"statusCode":503,"error":"Service Unavailable","message":"Service Unavailable"}

FYI this cloud account is attached to and paid through Azure.

You should be able to create a request with the Support team around this?

I have, but on my subscription it's potentially days until a response..which is why I tried to change it - which also didn't work.

Does anyone have any contacts at Elastic? We are still down and have no access to live data. Someone picked up the ticket yesterday morning and we've had no updates since. I've also had no indication from elastic support about how to upgrade our support package. We are trying to pay money to have this issue fixed but nobody seems to want to take it! We desperately need our data.

Hi @Supernovasports ,

Can you paste the unhealthy message shown in the console verbatim into a message here? Also, if it works some of the time, perhaps you can grab GET _nodes/stats and GET _nodes and attache here?

There is almost zero information, all I can see is:

Unhealthy deployment

No Kibana instances are running

I dont know how to do that.

I am being told that the place to change your subscription level is

https://cloud.elastic.co/account/billing

I suppose that is what you already did, but if not, worth trying out.

Yeah that's what I tried to change a couple of days ago, I get an error message.

I have logged a separate case for this and the response I got was

"Thank you for reaching out to Elastic support.
Please filling the billing information before changing the subscription level."

Which is of no help whatsoever.

I'm absolutely staggered by the response I have been getting. I realise we are not on platinum support but we are still paying a chunk of money for a hosted service which hasn't worked now for 51 hours (which I believe is through no fault of our own). I haven't had a reply on the ticket now for over a day even though I have asked to upgrade my account and pay for help!

I would understand if we are self hosting but we're paying for a cloud service which I naively thought was meant to be maintained for us!

It looks like this was due to an underlying Azure billing issue, which has also been resolved?

Yes, the problem has been traced down to 1 of our Azure payments failing to process. It wasn't caused by any underlying elastic tech problems which is certainly good to know.

However, it's been a tricky process to track down. There is no information available to the end user about what the issue could be. I'm very new to Elastic and cloud services in general, but I had a colleague (who's very experienced with elastic, albeit self hosting) look through and they couldn't see anything either. It's quite strange for services to fire up and run correctly for a period of time before shutting themselves down again. I personally went through every page of the deployment and account settings and saw nothing.

Added to which I would expect that the first layer of support would have the account information that showed what the problem was and could have notified us quickly after the ticket was picked up. I'm also still unable to upgrade my account to platinum with the same error messages as mentioned above.

That said, Azure was no more helpful. I am in the azure portal every day multiple times without fail and there was no alert or indication at all that a payment hadn't gone through. Added to which none of our azure services had experienced any interruption so I was blissfully ignorant that anything was wrong that side. You have to drill right down into the billing section to see anything's wrong which is not something I was generally doing (I don't pay the bills personally!).

Anyway, all's well that ends well and I have added a bunch of stuff to my processes and regular service status checking so this sort of thing can't happen again.

Thanks.

2 Likes

I can understand that's frustrating, it's good to see you have things resolved.