Configuration as Code for Elasticsearch

sonnenhund · March 24, 2023, 10:10am

Hi!

We are attempting to stand up a full ELK stack and wish to have Elasticsearch fully configured, including ILM policies, Index Templates, and if necessary bootstrapping indices, such that when data begins flowing into Elasticsearch everything is already set up correctly.

We could, potentially, write Bash/Sh or Python scripts to do PUTs and POSTs when Elasticsearch starts, however that seems like a roundabout way to configure Elasticsearch, and it doesn't guarantee that our required config is in place before data starts flowing in.

Is there a sane and standard way in elasticsearch.yml or elsewhere to define ILM policies and Index Templates (plus any necessary bootstrapping) such that they're reliably in place on Elasticsearch startup?

Kind Regards,

Sonnenhund

Christian_Dahlqvist · March 24, 2023, 10:18am

No, those aspect can not be configured through the config files. You need to use the APIs.

sonnenhund · March 24, 2023, 10:27am

No, those aspect can not be configured through the config files. You need to use the APIs.

Thank you for the prompt reply.

Can you point me to the recommended and standard way to fully automate the process of standing up an ELK stack without manual intervention?

In particular, Elasticsearch seems sensitive to data flowing in before ILM policies and Index Templates are established, leaving us with indices that are not correctly configured that require additional manual intervention to get them set up correctly.

Kind Regards,

Sonnenhund

leandrojmp · March 24, 2023, 12:06pm

I don't think there is any recommendation or standard way to do what you want as this depends entirely on your infrastrucutre and how you deal with your indices.

What you could do is something like what you are described, using some bash or python scripts to load your ILM policies and templates.

Of course that if your cluster is up and some system try to index on it you may have some indices created before the template, to avoid this you would need to make your cluster available only after you applied the ILM policies and templates.

For example, in your automate processo you could spin-up your elastic cluster using a random non-default port, execute the steps to apply the ILM policies and templates, change the port to the default port and restart the cluster.

Or you may even just have a step to delete any index after you applied the ILM policies and templates.

sonnenhund · March 24, 2023, 2:52pm

Thank you @leandrojmp, appreciate the additional information.

I believe I'll put in a ticket with the Elastic team; I cannot think of a reason why these configurations cannot be applied directly to Elasticsearch as it starts and before it accepts data.

There may be a great reason though. If there is I'd like to hear it.

warkolm · March 24, 2023, 8:41pm

Because it all needs to happen via the APIs, which are not available until Elasticsearch is up and ready to accept data (which technically mappings and settings via APIs are - data.

sonnenhund · March 25, 2023, 7:20am

Because it all needs to happen via the APIs, which are not available until Elasticsearch is up and ready to accept data (which technically mappings and settings via APIs are - data.)

Yes, it's understandable, but also a bit of a Catch 22.

In order to properly manage one's data (for most folks, anyway) one will need some combination of Index Templates, Data Streams, ILM Policies, and likely other configurations.

In our case, we've got systems already streaming data to the endpoint at which Elasticsearch lives, so there is a high likelyhood that if we need to create, recreate, or replace Elasticsearch (for whatever reason), we're going end up receiving data in outside of the configurations we need to have in place, which at least in my limited experience can have some odd effects.

It'd be nice not to have to worry about that, but perhaps I am thinking of this from a questionable point of view.

BenB196 · March 25, 2023, 12:27pm

There is a Terraform provider for Elasticstack, it is still somewhat a work in progress, but does contain a lot of the core stuff.

xeraa · March 26, 2023, 4:41pm

@sonnenhund a file-based configuration is currently only available in ECK (the Kubernetes Operator) in the latest version 2.6 as a tech preview. For all the reasons you mentioned — APIs can be quite cumbersome around timing but also if nodes are (temporarily) unavailable.

See the [documentation](Elastic Stack configuration policies | Elastic Cloud on Kubernetes [2.6] | Elastic for for creating the Kubernetes spec). This is also an enterprise-level (paid) feature.

rugenl · March 26, 2023, 8:07pm

Data for a new index can arrive to a long-running elastic stack. Take the modern beats, they include their version number in their index or data stream. Each beat, with proper configuration and permissions will bootstrap the new index before sending data.

sonnenhund · March 29, 2023, 6:25am

Thank you everyone for your suggestions and ideas, I very much appreciate you taking the time to respond!

There's certainly a chicken-and-egg problem with these kinds of configurations, which can be compounded in an environment where you might have events streaming into ES before ES is even started.

There is a case for the more RDBMS-like configuration - standing up a database or other storage mechanism, configuring it, then allowing systems/users to access it - but that's only one valid approach. And ES isn't an RDBMS.

Not sure what direction we'll take, but you've all been very helpful.

DavidTurner · March 29, 2023, 8:42am

I'm struggling a little to understand how ES differs from other data stores in this regard. For instance if you were using Postgres you would need to make some API calls to configure tables, triggers, stored procs, users, etc in between standing up a fresh instance and exposing it to traffic. Can you help me understand what I'm missing here?

There's various ways you can block incoming traffic while you're configuring your freshly-started store. You can always block at the network-level (e.g. a firewall or reverse-proxy/load-balancer thing) or by restricting permissions. In ES specifically you can also disable index auto-creation (action.auto_create_index: false) and/or ensure that indexing traffic always hits an alias using the ?require_alias option.

sonnenhund · March 29, 2023, 11:49am

Fair point. From my perspective there are a few differences:

A significant component here is my ignorance. I'm asking questions because it's unclear to me what would be considered normal or standard for ES and what baseline assumptions are being made of which I have no knowledge.
From the perspective of an RDBMS like PosgreSQL, one cannot begin to store until a schema is created, permissions for the schema added, etc... With ES one can (more or less) start it, throw data at it, and ES happily accepts it. Most interestingly (to me at least) is that in this state, some configuration is effectively created by the incoming data which can impact the behavior of subsequent configuration.
And while there's a lot of documentation, there's a learning curve to get to the point where the documentation is useful.

DavidTurner · March 29, 2023, 12:06pm

Ok, I get it now. This behaviour was introduced a very long time ago with the intention of making it easier to get started with ES, but it does indeed get in the way when you know what you're doing. It is at least configurable, and in your case it sounds like you want to switch it off to control these things more tightly.

jba · March 30, 2023, 9:17pm

We use a home-grown approach (from before my time) using Puppet with a collection of JSON-files (some static, some templates), that gets deployed on running Elasticsearch nodes whenever a file is modified.

Puppet will see the modification which can be made to trigger the execution of say a shell-script. In our case it is a somewhat generalized script that gets passed a REST API prefix (/_index_template, /_security/role. /_security/role_mapping, ...) and a JSON file. The script uses jq to validate the file, and then curls the API + basename of the file.

I will see if I can get permission to publish the script.

system · April 27, 2023, 9:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ILM policy setup for automated deployment system Elasticsearch ilm-index-lifecycle-management	5	1215	September 3, 2019
Creating ILM in a automated way Elasticsearch ilm-index-lifecycle-management	2	487	December 30, 2019
Index Lifecycle Management configuration file setup Elasticsearch ilm-index-lifecycle-management	5	1346	July 11, 2019
Programmatically configure ILM Elasticsearch ilm-index-lifecycle-management	4	2641	August 15, 2019
ILM policy based on index and not on index template Elasticsearch ilm-index-lifecycle-management	6	825	March 2, 2022

Configuration as Code for Elasticsearch

Related topics