Is this going to be efficient elasticsearch output config in logstash?

I am planning to use ILM in logstash. And there is a note saying : You cannot use dynamic variable substitution when ilm_enabled is true and when using ilm_rollover_alias.

At the same time this is also there:
In order to minimize the number of open connections to Elasticsearch, maximize the bulk size and reduce the number of "small" bulk requests (which could easily fill up the queue), it is usually more efficient to have a single Elasticsearch output.

I am planning to run logstash on http input and the clients will send data there. In logstash I will add tag to the events to classify the sources they are coming from.

I want to have the best of both the worlds, e.g. ILM as well as the efficiency of a single Elasticsearch Output. I am thinking of a config like below.

	output
	{
			if "project_one" in [tags] 
			{
					elasticsearch
					{
						   ilm_enabled => "true"
						   ilm_rollover_alias => "projectone"
						   ilm_pattern => "000001"
						   ilm_policy => "project_one"
						   hosts => "blah"
						   user => 'tony'
						   password => 'stark'
					}

			}
			else if "project_two" in [tags]
			{
					elasticsearch
					{
						   ilm_enabled => "true"
						   ilm_rollover_alias => "projecttwo"
						   ilm_pattern => "000001"
						   ilm_policy => "project_two"
						   hosts => "blah"
						   user => 'tony'
						   password => 'stark'
					}

			}
			else 
			{
					elasticsearch
					{
						   ilm_enabled => "true"
						   ilm_rollover_alias => "projectthree"
						   ilm_pattern => "000001"
						   ilm_policy => "project_three"
						   hosts => "blah"
						   user => 'tony'
						   password => 'stark'
					}
			}
	}

However I have a feeling that each

	elasticsearch
	{
	}

counts as an output.

In later stage lots of clients will be sending data to logstash. Presistence in logstash will be configured.

Yes, each elasticsearch {} is a separate instance of an elasticsearch output with its own queues and connection pools. As you note, the output does not sprintf the ilm_rollover_alias option (or ilm_policy ) so you cannot combine them.

The reason for this is that the code configures ILM at startup, so there is no event from which it can reference fields.

Clearly it would be possible to redo all the ILM configuration for every event, but that would add a huge overhead that very few folks want. I would guess much more overhead than having a handful of output instances adds.

The only use case where you might want to modify the output to sprintf these options would be where there are a very large number of rollover policies, which could only make sense if there was a monstrously large number of events. I cannot see it for terabytes of data, and not likely for exabytes.

And even if you have exabytes of data then there are probably much simpler architectures to achieve the same result.

We are at GB's level of data only. I will stick with multiple outputs. I will put in some load testing to see the behaviour. Thanks for quick reply.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.