Best practices for migrating ELK artifacts and configurations across environments

Hi All,

I am looking for guidance on best practices for migrating ELK artifacts from lower environments (e.g., Dev/Test) to higher environments (e.g., UAT/Prod).

For application code, we typically use GitLab/Jenkins with a CI/CD pipeline to promote changes across environments. Is there a similar standardized approach for managing and promoting Elasticsearch/Kibana artifacts?

Specifically:

  • What is the recommended way to migrate index templates, component templates, ILM policies, ingest pipelines and other cluster configurations? We see that for Dashboard there is import/export objects functionality available?

  • While index data can be backed up and restored using snapshots, how are cluster-level configurations (like templates and ILM policies) typically backed up and managed?

I have seen various discussions but I would like to understand what is generally considered a best practice in real-world setups.

Thanks in advance!!

Hey @Tortoise,

I use snapshot/restore for migration from Dev to Prod for all the data, index mapping/settings, ILM, fleet server configuration, etc.

Snapshots preserve more than your data. They also include the configuration and internal data of Elastic Stack features, such as ILM policies, index templates and pipelines, Kibana saved objects, alerting rules, Fleet settings and integrations, Elastic Security data, and more, depending on your use case.

The cluster state includes:

the feature state includes:

GET /_features API call output from my test ES v9.1 cluster

{
  "features": [
    {
      "name": "transform",
      "description": "Manages configuration and state for transforms"
    },
    {
      "name": "logstash_management",
      "description": "Enables Logstash Central Management pipeline storage"
    },
    {
      "name": "searchable_snapshots",
      "description": "Manages caches and configuration for searchable snapshots"
    },
    {
      "name": "security",
      "description": "Manages configuration for Security features, such as users and roles"
    },
    {
      "name": "tasks",
      "description": "Manages task results"
    },
    {
      "name": "inference_plugin",
      "description": "Inference plugin for managing inference services and inference"
    },
    {
      "name": "enrich",
      "description": "Manages data related to Enrich policies"
    },
    {
      "name": "fleet",
      "description": "Manages configuration for Fleet"
    },
    {
      "name": "watcher",
      "description": "Manages Watch definitions and state"
    },
    {
      "name": "geoip",
      "description": "Manages data related to GeoIP database downloader"
    },
    {
      "name": "machine_learning",
      "description": "Provides anomaly detection and forecasting functionality"
    },
    {
      "name": "ent_search",
      "description": "Manages configuration for Enterprise Search features"
    },
    {
      "name": "async_search",
      "description": "Manages results of async searches"
    },
    {
      "name": "synonyms",
      "description": "Manages synonyms"
    },
    {
      "name": "kibana",
      "description": "Manages Kibana configuration and reports"
    }
  ]
}

So as an answer to your question, simple restore command include global state and features states will be enough for migration.

1 Like

There is nothing standardized, but I think you can use the same approach as you do with code for those things, you would need to build some tooling however.

For example, at my company we build some tooling using github actions and python to deploy security rules as a code, same thing with Logstash pipelines, we can deploy the pipelines anywhere we want by just changing the environment variables.

I think almost everything can be done using API requests, so it can be automated.

I'm building something similar to ingest pipelines, I'm putting my custom ingest pipelines in a repository an using Github Actions + Python code to deploy them.

Personally I would not use snapshot and restore from dev into prod, one mistake and you may override your prod data with your dev data.

Also, how does this work for things like Fleet and Security? Restoring Fleet and Security state from dev into prod you will override the prod settings, so you would need to use the same credentials on dev and prod, which is a security issue, also if you restore your Fleet index from Dev into your Prod cluster, you will change the Fleet Settings and lose the agents enrolled on your Prod cluster.

The Fleet and Security settings would be different on each cluster, so they should not be part of the snapshot for this to work without later issues, I think.

1 Like

Thanks for the comment @leandrojmp.

With one mistake you can collapse everything, at anytime, at anywhere :slight_smile:

I assume you are using file-based role management ( Eg, roles.yml, users.yml ). With one mistake you can also overwrite the users and roles. Yes, if there is more secure way to do that it can be better to use it but with all my respect what you explained doesn’t feel more secure.

I’m totally agree with you. It’d be better to keep each fleet and security snapshots data separated and don’t restore them each other. I just show the way to do that.

@leandrojmp Just to stay at the topic, the question related to cluster config, ILM and index template, Kibana artifacts migrations. I believe Snapshot & restore is still the most efficient way to do that.

I do not disagree that Snapshot and Restore may work, what I mentioned was to add that there are multiple checks that need to be done to avoid breaking production, specially related to the feature states for security and fleet, even a different lifecycle policy may impact some things.

Also, I may be wrong because this is not something that I normally do, but restoring the global state and some feature states would basically clone Dev into Prod.

If this fits the user requirements, then Snapshot & Restore will work and will be pretty easy to do, but in my case, and I believe that this is also the case for many others, this would add some problems, both technical and related to compliance/auditing.

I still would recommend building a CI/CD pipeline to automate to keep things in sync.

2 Likes