Apply data streams, ILM, Index template through logstash

Hi Team,

I am deploying elasticsearch cluster with latest version i.e 7.14 through automation tool. The current config (v 7.4) is creating daily indices and it is not having data_stream, ILM policy, index_template etc..

I am trying to use latest features like data stream, ILM policy and mention them in logstash pipeline config file to apply these to new indices when it is created.

beats ---> logstash --> elasticsearch.

Currently i have logstash pipeline file like below.

input {
  beats {
    port => 5044
  }
}

filter {

if [log_type] == "app_server" and [app_id] == "app"
  {
    mutate { gsub => ["message","\|"," "] } grok { patterns_dir => ["/etc/logstash/patterns"] match => { "message" => "%{MY_DATE_PATTERN:timestamp}%{SPACE}%{LOGLEVEL:level}%{SPACE}%{UUID:ConsentID}%{SPACE}%{WORD:TraceID}%{SPACE}%{WORD:TransactionID}%{SPACE}%{GREEDYDATA:messagetext}" } }
    mutate {
             replace => {
               "[type]" => "app_server"
             }
           }
  }

if [log_type] == "access_server" and [app_id] == "as"
  {
    grok { match => { "message" => "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:%{MINUTE}(?::?%{SECOND})\| %{USERNAME:exchangeId}\| %{DATA:trackingId}\| %{NUMBER:RoundTrip:int}%{SPACE}ms\| %{NUMBER:ProxyRoundTrip:int}%{SPACE}ms\| %{NUMBER:UserInfoRoundTrip:int}%{SPACE}ms\| %{DATA:Resource}\| %{DATA:subject}\| %{DATA:authmech}\| %{DATA:scopes}\| %{IPV4:Client}\| %{WORD:method}\| %{DATA:Request_URI}\| %{INT:response_code}\| %{DATA:failedRuleType}\| %{DATA:failedRuleName}\| %{DATA:APP_Name}\| %{DATA:Resource_Name}\| %{DATA:Path_Prefix}" } }
    mutate {
             replace => {
               "[type]" => "access_server"
             }
           }
  }

output {
  if [log_type] == "app_server" {
  elasticsearch {
    hosts => ['http://es_ip:9200']
    index => "%{type}-%{+YYYY.MM.dd}"
        user => elastic
    password => xxx
           }
 }
 if [log_type] == "access_server" {
  elasticsearch {
    hosts => ['http://es_ip:9200']
    index => "%{type}-%{+YYYY.MM.dd}"
        user => elastic
    password => xxx
      }
}

 elasticsearch {
    hosts => ['http://es_ip:9200']
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
    user => elastic
    password => xxx
  }
}

So the index i am actually going to run queries against is currently getting created daily as,

access_server-2021.08.25
access_server-2021.08.26
access_server-2021.08.27

Before actually using index_template, ILM policy, it must first be created.

Below is the index_template i am planning to use,

PUT _index_template/access_template
{
  "version": 1,
  "priority": 500,
  "template": {
    "settings": {
      "index.number_of_shards": 1,
      "index.number_of_replicas": 0,
      "index.lifecycle.name": "testpolicy", 
      "index.lifecycle.rollover_alias": "access_server-alias" 
    },
    "mappings": {
      "dynamic": true,
      "numeric_detection": true,
      "date_detection": true,
      "dynamic_date_formats": [
        "strict_date_optional_time",
        "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
      ],
      "_source": {
        "enabled": true,
        "includes": [],
        "excludes": []
      },
      "_routing": {
        "required": false
      },
      "dynamic_templates": []
    }
  },
  "index_patterns": [
    "test-data-stream*"
  ],
  "data_stream": {}
}

Below is the ILM policy,

So this will perform rollover when either of the condition meets and then keep index in delete phase for 2 days and then performs delete.

PUT _ilm/policy/testpolicy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_age": "2d",
            "max_primary_shard_size": "900mb"
          },
          "set_priority": {
            "priority": 100
          }
        },
        "min_age": "0ms"
      },
      "delete": {
        "min_age": "2d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Bootstrap the initial time series index with a write index alias.

curl -X PUT "localhost:9200/access-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "aliases": {
    "access-alias":{
      "is_write_index": true 
    }
  }
}'

Create the data stream

the link says, You can also manually create the stream using the create data stream API. The stream’s name must still match one of your template’s index patterns.

curl -X PUT "localhost:9200/_data_stream/test-data-stream?pretty"

In logstash pipeline config file, mentioning all above created.

template => "access_template.json"
template_name => "access_template"
template_overwrite => "false"

i.e

output {

 if [log_type] == "access_server" {
  elasticsearch {
    hosts => ['http://es_ip:9200']
    index => "%{type}-%{+YYYY.MM.dd}"
        user => elastic
    password => xxx
    template => "access_template.json"
    template_name => "access_template" 
    template_overwrite => "false"
      }
}

(removed index => "%{type}-%{+YYYY.MM.dd}" in above, so that it will not create daily indices)

Q. i am confused, what data stream will name the index so somewhere above i am doing mistake in mentioning index_pattern, data_stream name so name mismatch might happen.

Q. Can i use it like above in logstash config to apply data_stream, ILM policy to apply to new indices?

Thanks,

According to the documentation you need to tell logstash that your output is a data stream.

This is the documentation for data stream with logstash.

Hi @leandrojmp,

Thanks for providing link. I completely forgot to mentioned about it.

After going through it, i figured out, below are the options required to use data_stream, ilm_policy and index_template through logstash pipeline.

data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "generic"
data_stream_namespace => "default"
ilm_rollover_alias => "pingaccess-alias"
ilm_pattern => "000001"
ilm_policy => "testpolicy"
template => 
template_name => "pingaccess_template"

I will try by adding above options in output filter of logstash after the password field.

What should be the path for template option above? We will load template inside elasticsearch through curl or PUT method etc.. so what can we specify as OS path for it?

from the given link, i didnt get below.

Events with data_stream.*` fields are routed to the appropriate data streams. If the fields are missing, routing defaults to logs-generic-logstash.
none of my application logs having field called data_stream.*, so will logs go to logs-generic-logstash data stream?

Thanks,

While deploying above, i am getting below error while creating index template.

.
.
"reason": "composable template [access_template] with index patterns [logs-access-default*], priority [500] would cause data streams [test-data-stream] to no longer match a data stream template",
                    "type": "illegal_argument_exception"
                }
            ],
            "type": "illegal_argument_exception"
        },
        "status": 400
    },
    "redirected": false,
    "status": 400,
    "url": "http://10.0.1.176:9200/_index_template/access_template",
  1. Created ILM policy.

  2. Not ran above point no. 3 i.e Bootstrap the initial time series index with a write index alias. i think its not required in data stream.

  3. While creating below index template it is giving me error.

request call - http://es_ip:9200/_index_template/access_template

index template file -

{
  "version": 1,
  "priority": 500,
  "template": {
    "settings": {
      "index.number_of_shards": 1,
      "index.number_of_replicas": 0
    },
    "mappings": {
      "dynamic": true,
      "numeric_detection": true,
      "date_detection": true,
      "dynamic_date_formats": [
        "strict_date_optional_time",
        "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
      ],
      "_source": {
        "enabled": true,
        "includes": [],
        "excludes": []
      },
      "_routing": {
        "required": false
      },
      "dynamic_templates": []
    }
  },
  "index_patterns": [
    "logs-access-default*"
  ],
  "data_stream": {}
}

i am planning to create data_stream name as logs-access-default therefore mentioned below options in logstash pipeline config but logstash is not deployed yet as i am getting error in elasticsearch and automation stopped here.

data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "access"
data_stream_namespace => "default"
ilm_rollover_alias => "access"
ilm_pattern => "000001"
ilm_policy => "testpolicy"
template => "/tmp/access_template"
template_name => "access_template"

bydefault its looking for data_stream as test-data-stream. How can i change it to the name i want.

How this error can be fixed?

Thanks,

Hi,

i have deployed elasticsearch cluster. Can see ILM policy, Index template is created but the data_stream is not created. Can see only system indices.

In logstash-plain.log, below errors are reported.

``
[2021-08-28T04:52:37,357][ ERROR ][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create, action_result: false", :backtrace=>nil}

[2021-08-28T04:53:10,813][ ERROR ][logstash.outputs.elasticsearch][main] Invalid data stream configuration, following parameters are not supported: {"template"=>"/tmp/access_template", "ilm_pattern"=>"000001", "template_name"=>"access_template", "ilm_rollover_alias"=>"access", "ilm_policy"=>"testpolicy"}

[2021-08-28T04:53:10,820][ ERROR ][logstash.outputs.elasticsearch][main] Invalid data stream configuration, following parameters are not supported: {"template"=>"/tmp/access_template", "ilm_pattern"=>"000001", "template_name"=>"access_template", "ilm_rollover_alias"=>"access", "ilm_policy"=>"testpolicy"}

[2021-08-28T04:53:11,686][ ERROR ][logstash.javapipeline ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<LogStash::Configuration Error : Invalid data stream configuration: ["template", "ilm_pattern", "template_name", "ilm_rollover_alias", "ilm_policy"]>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.0.2-java/lib/logstash/outputs/elasticsearch/data_stream_support.rb:57:in check_data_stream_config!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.0.2-java/lib/logstash/outputs/elasticsearch/data_stream_support.rb:24:in data_stream_config?'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.0.2-java/lib/logstash/outputs/elasticsearch.rb:292:in register'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:131:in register'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:68:in register'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:228:in block in register_plugins'", "org/jruby/RubyArray.java:1820:in each'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:227:in register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:585:in maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:240:in start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:185:in run'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:137:in block in start'"], "pipeline.sources"=>["/etc/logstash/conf.d/logstash.conf"], :thread=>"#<Thread:0x24368293 run>"}

[2021-08-28T04:53:11,692][ ERROR ][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create, action_result: false", :backtrace=>nil}
``

It is saying below parameters are not supported.

Invalid data stream configuration, following parameters are not supported: {"template"=>"/tmp/access_template", "ilm_pattern"=>"000001", "template_name"=>"access_template", "ilm_rollover_alias"=>"access", "ilm_policy"=>"testpolicy"}

but in below link these parameters are mentioned.

Can someone point what is the problem?

Thanks,

Apparently you can't use those parameters when using the data_stream parameters, but I didn't find any mention to it in the docs.

But considering that the first steps when creating a data stream are to create a lifecycle policy and an index template, it makes sense to not have those configurations in the output.

If you already created the ILM policy and the template, there is no need to use thoses parameters, I would suggest that you remove them and let just the data_stream ones.

Answered with links to the code on SE. As Leandro says, the limitations on which parameters you can use appear to be undocumented, and the Elasticsearch documentation explicitly mentions using ILM with datastreams, but it appears logstash does not support that.

Hi @leandrojmp, Thanks for reply. I have removed parameters giving error and kept only data stream and can see below data stream is created.

yellow open   filebeat-7.14.0-2021.08.28                    Bhr8xSwwR1eJvZ1YeGN5zw   1   1    4779329            0      1.5gb          1.5gb
yellow open   .ds-logs-pingaccess-default-2021.08.28-000001 9QZr_-LMQ22Bi_HJ1PTQyw   1   1     845294            0    195.1mb        195.1mb

but there are other issue.

  • our index template has not been applied to this data stream (we have set replica to 0, but here its 1)
  • our policy testpolicy under Index Lifecycle Policies in kibana, is showing Linked indices as 0.
  • above data stream in Index Management underkibana, is showing Lifecycle policy to
    logs and not to our testpolicy.
  • it looks all the existing data present on server got indexed into filebeat-7.14.0-2021.08.28 (see its size and doc counts above) and not in our data stream as i am getting the query result after selecting filebeat index and data stream index but our intention to index data into datastream.

there must be something wrong in the our logstash config because of which its indexing to filebeat index and to data stream or some data stream parameters are not supported by logstash.

We created policy, index template from that link but skipped below step as it was giving error.

Create the data stream

Thanks,

Hi @Badger, Thanks for update. Anything we can do to let elasticsearch team know about this?

Hi Team,

Can someone please reply.

Or should I drop data stream and use legacy index template etc ?

My main intention was to apply no. Of primary shard settings at the start. (as that can not be changed later), other thing like ILM policy etc.. can be apply on existing indices but I was trying to used data stream, lim policy etc.. together during es cluster deployment.

Thanks,