Badly formatted index, after interpolation still contains placeholder

Hey guys, facing this below issue, was working fine while using indexes directly but started failing when configured to use data streams,

ERROR: elasticsearch - Badly formatted index, after interpolation still contains placeholder: 
[logs-ssc-misc-%{[instance_name]}-%{[instance_IP]}-nonprod]
EVENT: {
   "agent""=>"{
      "hostname""=>""my-node",
      "name""=>""my-node",
      "id""=>""yyy",
      "type""=>""filebeat",
      "ephemeral_id""=>""c44-c44-c44-4c4-4c4",
      "version""=>""7.8.1"
   },
   "instance_name""=>""my_node_logs",
   "log""=>"{
      "file""=>"{
         "path""=>""/usr/share/filebeat/a.log"
      },
      "offset"=>0
   },
   "level""=>""DEBUG",
   "@metadata""=>"{
      "version""=>""7.8.1",
      "beat""=>""filebeat",
      "input""=>"{
         "beats""=>"{
            "host""=>"{
               "ip""=>""0.0.0.0"
            }
         }
      },
      "type""=>""_doc"
   },
   "logger""=>""{org.apache}",
   "instance_IP""=>""0.0.0.0",
   "message""=>yy"
}

This event you shared is before or after it runs through logstash?

The error you got means that the fields instance_name and instance_IP does not exist when the event arrives at the output level, this can happen if you are not parsing your original message.

What is your source? Please share your Logstash configuration.

Also, if possible adds a stdout or file output and share the event that logstash is sending to the outputs.

 input {
            beats {
                port => 5044
            }
        }

filter {
mutate {
                  gsub => [
                      # Replace hyphens with underscores in instance_name
                      "instance_name", "-", "_"
                  ]
              }
}
 
output {
    elasticsearch {
                    hosts => [ "${ELASTICSEARCH_HOST1}", "${ELASTICSEARCH_HOST2}", "${ELASTICSEARCH_HOST3}" ]
                    user => "${ELASTICSEARCH_USERNAME}"
                    password => "${ELASTICSEARCH_PASSWORD}"
                    cacert => "/usr/share/logstash/certs/ca.pem"
                    data_stream => "true"
                    data_stream_type => "logs"
                    data_stream_dataset => "ssc-misc-%{[instance_name]}-%{[instance_IP]}"
                    data_stream_namespace => "nonprod"
                }      
}

The event i have posted above is taken from the logstash logs .i.e. the event logstash received.

You need to share the event that logstash is ending to see if it needs any parse or not.

Add a file output temporarily and share some lines:

file {
    path => "/tmp/temp-output-logstash.json"
}

this is the json content

{
   "@version":"1",
   "level":"DEBUG",
   "message":[
      "a",
      "b"
   ],
   "@timestamp":"2023-10-17T14:41:08.497Z",
   "tags":[
      "beats_input_codec_plain_applied"
   ],
   "input":{
      "type":"log"
   },
   "tid":"-1234",
   "log":{
      "offset":5052393,
      "file":{
         "path":"/usr/share/filebeat/a.log"
      }
   },
   "instance_IP":"0.0.0.0",
   "timestamp":"2023-10-17 14:41:08,497",
   "logger":"{org.apache.synapse.transport.http.wire}",
   "agent":{
      "name":"my-node",
      "version":"7.8.1",
      "hostname":"my-node",
      "id":"1234",
      "ephemeral_id":"1234",
      "type":"filebeat"
   },
   "ecs":{
      "version":"1.5.0"
   },
   "event":{
      
   },
   "instance_name":"my_node_logs",
   "host":{
      "name":"my_node"
   },
   "type":"mynode"
}

This is the index template pattern mapped to ILM policy with DS enabled,

logs-ssc-misc-*-*-nonprod*

This is the content of the file output in Logstash?

Not sure what the issue is, the fields are present in the document.

Does this happens for every document?

yes, all events are failing.

I have other outputs, which has static names, that works fine.

Just found a similar issue with a solution, check this post.

You will need to some data_stream fields, basically you will need to create the fields:

  • datas_tream.type
  • data_stream.dataset
  • data_stream.namespace

You need this in your filter block:

mutate {
    add_field => {
      "[data_stream][type]" => "logs"
      "[data_stream][dataset]" => "ssc-misc-%{[instance_name]}-%{[instance_IP]}"
      "[data_stream][namespace]" => "nonprod"
    }
  }

And remove those settings from the output.

2 Likes

@leandrojmp @John_paul

Caution: the dataset can not have - dashes in it ... dashes can only be used to separate the type, dataset and namespace

So, the above is invalid.

These three parts are combined by a “-” and result in data streams like logs-nginx.access-production . In all three parts, the “-” character is not allowed. This means all data streams are named in the following way:

In short use . dots within the fields

2 Likes

Thanks @stephenb , didn't know about that.

But this limitation only exists if you want to use the data_stream settings in the Elasticsearch output I think since it will validate the value of the fields.

I'm using custom data streams names, so I can not use those data_stream settings and do not have this limitation.

Worked like a charm. Thanks @leandrojmp . Life saver.

yeah @stephenb i read the documentation, the hyphens are not allowed as DS will use hyphens to create the index names internally But it works without any issues even with hyphens. What i noticed is only in scenarios of creating dynamic names, during that interpolation its not allowing to use hyphens.

I think it is more than just using those settings...

To be clear, you can use custom names but more data stream functionality (free good stuff) is coming that will depend on the proper naming, like automatic routing, automatic custom pipelines based on the names, routing of mapping exceptions etc ... these are some of the things that I understand are on the future roadmap .. (I am playing with some of them now...)

They will not be interpreted correctly in downstream or dependent operations.

Proceed at your own risk / caution.

1 Like

Yeah, while I disagree with some things, I understand that approach.

By custom data stream names I mean using something where the type is not logs, metrics, traces or synthetics.

If you want to use something like appname-prod as a data stream name, you cannot use the data_stream settings in Logstash and need to index it as a normal indice pointing to the data stream name.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.