[7.10.2] Pending alerts when multiple instances of kibana

Hello. This thread is the translation of this one, this one and this issue that may be reopened when I'll have more clues

So, I have created a brand new deployment on cloud.elastic.co. Created an alert on it with an email action. It goes correclty from PENDING to ACTIVE or OK.

Now, I create a deployment with several kibanas that are connected to the same cluster. So on this cluster I have:

  • the cloud kibana that is delivered along with the elastic.co service (with the bizarre and long URL)
    some other kibana that I host (nice domain names and own .kibana indices

Here is the configuration of my own kibanas

server.basePath: "/bigdata"
server.rewriteBasePath: false

kibana.index: .kibana-XXX

elasticsearch.hosts: ["https://XXXXXXXXXXXXX12baa1dc4.eu-west-1.aws.found.io:9243"]
elasticsearch.username: "kibanaXXX-"
elasticsearch.password: "XXXXXXX"

logging.dest: /home/XXX/logs/kibana.log

xpack.reporting.encryptionKey: "znUXXXXQEoiveBYh"
xpack.reporting.kibanaServer.port: 443
xpack.reporting.kibanaServer.protocol: https
xpack.reporting.kibanaServer.hostname: XXX.flightwatching.com
xpack.reporting.index: ".reporting-XXX_"
xpack.encryptedSavedObjects.encryptionKey: "znUxbvXXXXXXXnUxbvCDuQEoiveBYh"
xpack.security.encryptionKey: "znUxbvXXXXXXXnUxbvCDuQEoiveBYh"

Well, in that case (multiple instances), it seems that the alerting does not work. The alerts remain PENDING, even if it seemed that it workd 1 time, randomly

It remembers me this issue with reporting, that is similar.

anybody to check/validate/fix?

best

1 Like

@tsullivan do you think you could help on this one? Thanks!

hello @tsullivan and @dadoonet

Did you find something?

Hey Oliver,
There's a mix of different issues that might be causing this.

I'll be 100% honest and say that a deployment such as yours (a mix of cloud and self hosted Kibana) is not the easiest configuration for us to support (as we try to automate things out of the way in Cloud, which means aligning them with self hosted can be a bit of a faff) - but I do believe it is achievable.

Most likely it's one of these two things:

Keep your encryptionKeys in sync

Are the xpack.encryptedSavedObjects.encryptionKey the same across all Kibana instances? This includes Cloud and self hosted - they must all be using the same exact key.

I believe the xpack.encryptedSavedObjects.encryptionKey is hidden in cloud (for obvious security reasons) , and you'll have to reach out to cloud support to ask them to override the encryptionKey on your cloud instance to match that of your local instances.

Just a heads up - replacing the cloud encryptionKey with your local one will only work if you're using a fresh Cloud instance which hasn't used its default encryption key to create any Alerts yet. If you have already used it to create new alerts, you'll want to migrate them to the new encryptionKey (which cloud support can assist with as well), otherwise you'll lose them.
The good new is, if this is an experiment and you don't mind losing those Alerts (just the ones created by the Cloud encryptionKey), then you can just delete them and start over once you have the encryptionKeys in sync.

Configure NTP

Are the hosts all in sync from a time perspective? To ensure the background tasks are picked up correctly it's important to ensure all ES and Kibana instances are synced (via NTP, or the likes). I'm not 100% sure how to achieve this with a dual environment deployment, but cloud support should be able to help you achieve that too.

Lets look at the Logs

It would also be helpful if you could look in your Kibana server logs on cloud and see if there are any errors in there.
If the EncryptionKey is the culprit, you should see decryption errors from time to time (depending on the cadence of the alerts that are failing).

I hope that all helps :slight_smile:
Sorry about the faff :grimacing:

hey Gidi, thank's for this extended answer. Here are my comments:

On the mix between cloud's kibanas and my own kibanas

I have to say that a year ago, I used to disable kibana from the elastic cloud cluster. Starting 1 or 2 years before now, it is not possible anymore, which led to all my problems (I felt like a regression at this time).

And I understood that having several instances of kibana is an architecture that you validate (even the cluster monitoring feature allows you to monitor several kibanas for a cluster)

Keep your encryptionKeys in sync

I gonna try that, but If I understand well, this key is here to encrypt sensitive data of each of my kibana instance.
Currently, each kibana instance has its own .kibana_XXXX index to store its saved objects. Can you explain why it should be the same value?

For example, I do not want necessarily the same email connector for 2 different kibanas

Configure NTP

Yes all the kibanas use apt NPT service

Lets look at the Logs

How can I see the logs of kibana in the cloud? I can see the logs of elasticsearch, but not kibana

Additional question

Would stop routing requests do something positive to my problem?

Hey Oliver,

Regarding the mix of cloud and on-prem, we definitely support this configuration in general.
It's Alerting specifically where this becomes a little tricky, as it's a complex system with many moving parts, and it requires keeping some configurations in sync across instances.

It sounds like your deployment is actual even more complex that the default... as you're using legacy multi tenancy (which we're due to remove support for in 8.0).
I hadn't realized that this is your configuration, which sadly makes things a tad more complicated. :frowning:
Sorry about that - this complexity is exactly why we've deprecated this form of multitenancy.

In any case - I still think we can make this work. :thinking:

Alerting Saved Objects are in fact stored in the .kibana index, as you stated, but the tasks that the Alerting framework uses are stored in the .kibana_task_manager index.
You're going to need to configure different task manager indices for each instance, in the same way as you have for each tenant.

You're right that the key won't need to be the same across Kibana instances if they don't read the same index, but if your Task Managers (each Kibana runs its own TM) are using the same index then they will share their tasks, which would explain why alerts (or rather, their underlying tasks) are failing.
What is most likely happening is that an alert created in one Kibana is then picked up by Task Manager in another Kibana.

The way the setup could work for you is if each Kibana instance has its own .kibana index and its own .kibana_task_manager index.

That said... this would be far simpler if you used Spaces to achieve the multi tenancy, at which point this should work by simply keeping keys in sync.
If there a reason why you're not using Spaces? :thinking:

It should be in the same place :thinking:
Can you post a screenshot of what you're seeing?

I'm not sure I'm following.... stop routing which requests?

here are the logs from the console

and here is what I call stop routing requests

So, turns out, I was wrong - the Kibana logs aren't available to you by default on cloud (my bad, sorry).
If you reach out to support they check your logs for errors, but for now I'd focus on setting up the custom Task Manager indices - I think that should get it working for you.

Oh, I see.
That will prevent your cloud kibana instance from receiving http requests, but it will still try to run tasks, so doing that won't help.

Wow, yes, the xpack.task_manager.index seems to be THE correct lead! I feel optimistic!

Did it quickly, and it seems to work! As it is a random one, I'll wait a bit to check if it is stable!

1 Like

Always a good state of mind :wink:
Sorry for the faff!

That's great!
I've noticed that that config key is missing in our cloud allowlist and noted this on an issue on our end, so hopefully we can get that sorted so you can apply it to your cloud instance too.

For now it should be fine as long as you have only one cloud instance, but once you add a second (assuming they use different .kibana indices), you'll have similar issues.

In the long term I really recommend moving away from this for of multitenancy and using Spaces instead - they're far easier to use and administer :slight_smile:

Good luck!