Logstash deletes source files and not creating index - date parsing issue

I am using Elastic and logstash 8.11 running in docker.

When logstash starts it deletes log files from the source directory but nothing is passed to elastic - no index is created.

I am not sure why logstash is deleting source files in the first place.

I would apreciate if someone could help to diagnose this.

My logstash.conf is:

input {
  file {
    mode => "read"
    path => "/usr/share/logstash/ingest_data/**/**/*.json"
    codec => "json_lines"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    file_chunk_size => 1048576
  }
}

filter {
  json {
    source => "message"
    target => "parsed_data"  # Using 'parsed_data' as a namespace to avoid conflicts
  }
  date {
    match => ["[fields][@timestamp]", "UNIX_MS"]
    target => "@timestamp"
  }

  mutate {
    copy => { "[fields][source]" => "source" }
    copy => { "[fields][eventType]" => "eventType" }
    copy => { "[fields][category]" => "category" }
#    remove_field => ["fields"]  # Optional: remove the original 'fields' object if it's no longer needed
  }
output {
  elasticsearch {
    index => "logstash-%{+YYYY.MM.dd}"
    hosts => "${ELASTIC_HOSTS}"
    user => "${ELASTIC_USER}"
    password => "${ELASTIC_PASSWORD}"
    cacert => "certs/ca/ca.crt"
  }
}

I am trying to parse logs like this:

{"_index":".ds-logs-dsf-2023.07.31-000018","_id":"24-12115677111-1690555439","_score":6.2382417,"fields":{"eventId":[100],"@timestamp":["1690888439226"],"source":["manager"],"eventType":["information"],"category":["rtp.document.Service.main.checks.Validator"],"message":["Basic validation passed. ."]}}

Because you are using it in the read mode, and when mode is set to read, it will use use the setting file_completed_action, which as the default will delete the file after processing.

Also, in read mode, the setting start_position is ignored, as described in the documentation.

Did you check Logstash logs? Do you have anything in Logstash logs? please share the logs.

1 Like

@leandrojmp - yes you right about read mode - I have chnaged it to tail now. It is not deleting anything.

In the logs I have:

It can connect to elastic

023-11-15 18:14:24 [2023-11-15T18:14:24,863][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"https://elastic:xxxxxx@es01:9200/"}
2023-11-15 18:14:24 [2023-11-15T18:14:24,874][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch version determined (8.11.0) {:es_version=>8}
2023-11-15 18:14:24 [2023-11-15T18:14:24,874][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>8}
2023-11-15 18:14:25 [2023-11-15T18:14:25,026][INFO ][logstash.codecs.jsonlines][main][98e3ef207a4ae9d98a067a1a59b68f9c428884fa2315628551b7969ffef13295] ECS compatibility is enabled but `target` option was not specified. This may cause fields to be set at the top-level of the event where they are likely to clash with the Elastic Common Schema. It is recommended to set the `target` option to avoid potential schema conflicts (if your data is ECS compliant or non-conflicting, feel free to ignore this message)
2023-11-15 18:14:25 [2023-11-15T18:14:25,713][INFO ][logstash.codecs.jsonlines][main][98e3ef207a4ae9d98a067a1a59b68f9c428884fa2315628551b7969ffef13295] ECS compatibility is enabled but `target` option was not specified. This may cause fields to be set at the top-level of the event where they are likely to clash with the Elastic Common Schema. It is recommended to set the `target` option to avoid potential schema conflicts (if your data is ECS compliant or non-conflicting, feel free to ignore this message)

I have also this and nothing more:

2023-11-15 18:14:40 [2023-11-15T18:14:40,394][INFO ][logstash.outputs.elasticsearch][main] Using a default mapping template {:es_version=>8, :ecs_compatibility=>:v8}
2023-11-15 18:16:02 [2023-11-15T18:16:02,119][INFO ][logstash.codecs.jsonlines][main][98e3ef207a4ae9d98a067a1a59b68f9c428884fa2315628551b7969ffef13295] ECS compatibility is enabled but `target` option was not specified. This may cause fields to be set at the top-level of the event where they are likely to clash with the Elastic Common Schema. It is recommended to set the `target` option to avoid potential schema conflicts (if your data is ECS compliant or non-conflicting, feel free to ignore this message)

And what is the result of the following requests on Kibana Dev Tools:

GET _cat/indices?v

and

GET logstash-*/_search

Also a couple of things about your logstash configuration.

If your source file is composed by line delimited json, for example, each line is a json document, you should not use the json_lines codec, but the json codec, this is in the documenation as well.

NOTE: Do not use this codec if your source input is line-oriented JSON, for example, redis or file inputs. Rather, use the json codec.

Another thing is, if you are using the codec in the input, you do not need the json filter as your message will already be parsed.

I would recommend to not use a codec in the input and rely on the json filter, and if you choose to do that, you need to add the top-level [parsed_data] to your filters, as you are parsing the json into a target field, so instead of [fields][anything] you need to use [parsed_data][fields][anything].

1 Like

@leandrojmp - You are a star - all is working now.
I have chnaged logstash.conf as per your recommendation and it is working perfectly.
I have one more question to you if you dont mind.

I wanted to parse timestamp date which in my case is "@timestamp":["1690888439226"] - this is why I have included filter for it:

date {
    match => ["[fields][@timestamp]", "UNIX_MS"]
    target => "@timestamp"
  }

However it is parsed to

parsed_data.fields.@timestamp
1690978392640

But it doesn't decode the timestamp so it is kind of hard searching through it. Could you point me in to a direction what can I do.

input {
  file {
    mode => "tail"
    path => "/usr/share/logstash/ingest_data/**/**/*.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    file_chunk_size => 1048576
  }
}

filter {
  json {
    source => "message"
    target => "parsed_data"  # Using 'parsed_data' as a namespace to avoid conflicts
  }
  date {
    match => ["[fields][@timestamp]", "UNIX_MS"]
    target => "@timestamp"
  }

  mutate {
    copy => { "[parsed_data][fields][source]" => "source" }
    copy => { "[parsed_data][fields][eventType]" => "eventType" }
    copy => { "[parsed_data][fields][category]" => "category" }
#    remove_field => ["fields"]  # Optional: remove the original 'fields' object if it's no longer needed
  }
}

Use match => ["[parsed_data][fields][@timestamp]", "UNIX_MS"]

@Badger I have tried it. I have also addedd target to a diferent field:

But new firld event_date is not created after that. If I dont specify target or add target => @timestamp it doesn't parse date. Not sure what is wrong with it.

filter {
  json {
    source => "message"
    target => "parsed_data"  # Using 'parsed_data' as a namespace to avoid conflicts
  }
  date {
    match => ["[parsed_data][fields][@timestamp]", "UNIX_MS"]
    target => "[parsed_data][fields][event_date]"
    
  }

  mutate {
    copy => { "[parsed_data][fields][source]" => "source" }
    copy => { "[parsed_data][fields][eventType]" => "eventType" }
    copy => { "[parsed_data][fields][category]" => "category" }
#    remove_field => ["fields"]  # Optional: remove the original 'fields' object if it's no longer needed
  }
}

I am getting "_dateparsefailure" appended to the document.

Since your source document looks like this, it seems that the field is an array.

Try to use [parsed_data][fields][@timestamp][0] in the date filter.

After I parse json input I have date filed stored as

"parsed_data.fields.@timestamp.keyword": [
      "1690883634738"
    ],

Oryginal log is like this:

{"_index":".logs-rtp-2023.07.31-000018","_id":"24-12116678855-1690888439","_score":6.2382417,"fields":{"eventId":[100],"@timestamp":["1690888439226"],"source":["dmanager"],"eventType":["information"],"category":["Service.Common.Validator"],"message":["Basic validation passed. "]}}

My filter I have tried was:

date {
match => ["[@timestamp][0]", "UNIX_MS"]
target => "human_readable_timestamp"
tag_on_failure => ["_timestamp_parse_failed"]
}

And this one is not showing error but date is not parsed and human_readable_timestamp firld is not generated unfortunately.

Once I have changed my filter to:

date {
    match => ["[parsed_data][fields][@timestamp][0]", "UNIX_MS"]
    target => "@timestamp"
    tag_on_failure => ["_timestamp_parse_failed"]
  }

All is working as planned - so basically my issue with this was that initial field wasn't specified correctly.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.