Help with date-time conversion from Filebeat

bobus · December 18, 2023, 3:42pm

I'm sending log data from Filebeat (running on Kubernetes) to Graylog/Elasticsearch. I need to ensure that the date-time field inside the JSON message block that is part of the log entry has this format: 2023-12-11T23:23:22.000Z

Right now the middle "T" is missing.

So I'm using this config:


  filebeat.yml: |-
    filebeat.inputs:
    - type: container
      paths:
        - /var/log/containers/*.log
    processors:
      - add_kubernetes_metadata:
          host: ${NODE_NAME}
          matchers:
          - logs_path:
              logs_path: "/var/log/containers/"
      - decode_json_fields:
          fields: ["message"]
          target: "_message"
          overwrite_keys: true
          add_error_key: true
      - convert:
          fields:
            - {from: "_message_timestamp", type: "string", layout: "2006-01-02T15:04:05.999"}
          ignore_missing: true

The "message" JSON block is properly parsed and its fields extracted into new fields with the "_message" prefix, but the conversion does not take place and we still get the error from Elasticsearch.

In fact, when, just for testing, I replaced that kind-of-complex conversion with one that just changes the type of "kubernetes_pod_ip" from type="ip" to type="string" and then renames it to "kube_pod_ip", it still didn't do anything at all, it's as if my conversion block is ignored.

      - convert:
          fields:            
            - {from: "kubernetes_pod_ip", to: "kube_pod_ip", type: "ip"}

Please suggest a way out of this, we like Filebeat a lot, if only it can allow us to make such simple conversions...

stephenb · December 18, 2023, 4:06pm

Hi @bobus

I think there are Couple issues here.....

Pretty sure that means your fields would look like

_message.timestamp

I would take a close look at the format of the output JSON and make sure you are referencing the fields correctly

You can output the JSON to the console and take a look... or in elasticsearch

Are you sure that is getting parsed correctly?

BUT there is no layout parameter with convert as far as I can tell... so not sure what you got that... so I do not think convert is what you are looking for.

It looks like you are trying to fix a date field....

I think for this specific use can you are looking for

bobus · December 18, 2023, 7:54pm

Hello and thanks for responding,

My resulting fields do look like this:

Yes, I'm trying to fix a date field. I want the incoming field to comply with what Elasticsearch expects:

ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [_message_timestamp] of type [date] in document with id '6f8b0872-9db8-11ee-9950-0242ac1a0004'. Preview of field's value: '2023-12-18 15:16:35.781']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [2023-12-18 15:16:35.781] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];

strict_date_optional_time has a "T" between date and time. My data doesn't.

If "convert" is not the right path, please tell me what is.

bobus · December 18, 2023, 8:34pm

Oh, and, how can I output the JSON that's sent out to the console?

stephenb · December 18, 2023, 8:49pm

Apologies I did not look closer, you are shipping from containers...

Did you try what I suggested from here

processors:
  - timestamp:
      field: _message_timestamp
      target_field: _message_timestamp
      layouts:
        - '2006-01-02 15:04:05.999'
      test:
        - '2023-12-18 15:16:35.781'

bobus · December 19, 2023, 11:12am

Thanks a lot, what you suggested resolved that problem with _message_timestamp, yet now I have a similar problem with _message_ts:

ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [_message_ts] of type [date] in document with id '671e3b21-9e54-11ee-9950-0242ac1a0004'. Preview of field's value: '1.702979582618132E9']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [1.702979582618132E9] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];

So I applied the exact same fix as you suggested, using a 2nd timestamp processor:

  - timestamp:
      field: _message_ts
      target_field: _message_ts
      layouts:
        - UNIX

...since "UNIX" is a standard format along with UNIX_MS, but the error persists.

It seems to me that, if I added a 3rd candidate format, using epoch_second, I'd be able to parse this field. But where are these alternative-allowed-formats defined?

Another important question: does this mean that the log entry does not get into the database? Or does it get in, but not fully parsed?

When I search that particular stream for all entries that contain the _message_ts field with "exists: _message_ts", I get zero results, so it seems all failed log entries are rejected, which is a no-no for us.

stephenb · December 19, 2023, 4:30pm

Yes when there is a mapping parsing exception... the document is dropped.

Are you using the default template or did you create your own.

What datastream is being written to.. I can not tell from the error logs.

That is not a valid long so that is why it is not being read.. it looks something before is converting it to scientific notation.

Did you manually set _message_ts to type date?

bobus · December 19, 2023, 5:43pm

First of all thanks for continuing to help!

The data stream is called "integration_stream".

What "default template" are you referring to?

I don't know how the scientific notation came into play here.

In the index mapping, the _mapping_ts field is marked as "date". I did not manually set it. Interestingly, in yesterday's index from the same index set the mapping is "float".

Here is my current processor setup. As you can tell I'm getting a bit desperate with the _message_ts field.

filebeat.inputs:
- type: container
  paths:
    - /var/log/containers/*.log
processors:
  - add_kubernetes_metadata:
      host: ${NODE_NAME}
      matchers:
      - logs_path:
          logs_path: "/var/log/containers/"
  - decode_json_fields:
      fields: ["message"]
      target: "_message"
      overwrite_keys: true
      add_error_key: true      
  - timestamp:
      field: _message_ts
      target_field: _message_ts
      layouts:
        - 'UNIX'
        - UNIX
        - '1123456789.123456'
        - 1123456789.123456
      test:
        - 1702912588.904218
  - timestamp:
      field: _message_timestamp
      target_field: _message_timestamp
      layouts:
        - '2006-01-02 15:04:05.999'
      test:
        - '2023-12-18 15:16:35.781'

stephenb · December 19, 2023, 9:47pm

So, you are not defining a mapping. This is a source of many of your issues...

A mapping is a "schema"

Index Templates and Mappings

If you do not provide a mapping ... elastic will "Guess" at the data type from the first value it sees... so if it sees something that looks like a float it picks a float if it look

For any real or production use case we always recommend defining a template / mapping.

So if you do not change the index name the data from filebeeat will be written into a filebeat data stream, which will come with a robust template

You did not share the output section of your config so I am just learning this now.

bobus:

  - timestamp:
      field: _message_ts
      target_field: _message_ts
      layouts:
        - 'UNIX'
        - UNIX
        - '1123456789.123456' <!---- Not valid / Will not work
        - 1123456789.123456 <!---- Not valid / Will not work

That will definitely not work... you need to read the docs closely, that the processor does take random layouts, etc...

That fields is NOT a timestamp of any type and will take significant work to convert if that is actually the value

bobus · December 20, 2023, 8:07am

I understand now what you mean by template, yes I'm aware of those. But as I wrote the _message_ts field is already mapped as "date", so what is the problem? Isn't "date" correct?

But how can I add a template with a mapping, how do I know in advance what it needs to look like?

I didn't share the output section because I don't have one, except perhaps this:

output.logstash:
  hosts: ["10.65.82.185:5045"]

I added those "will not work" layouts after seeing that "UNIX" didn't work. I don't know how that 1.702979582618132E9 came up so I'm trying to cope with it. Perhaps the data is coming in as a float but is shown in a scientific notation by ES.

stephenb · December 20, 2023, 4:32pm

That is a very important output section.. your sending to logstash

So I feel like I am looking at our issue through a "porthole" and can't see the whole horizon / issue... I am just getting information piecemeal....

So the next what you logstash pipeline output section, what index are you writing to or data stream are you writing to? What does that output section look like.

In short ...

I think you need to back up a bit and understand the over concepts

You need to create a template / mapping as I mentioned before... otherwise, you do not have control over the data types / mapping for your index

You need to fix that data coming in; you will need to parse/convert it somewhere probably in logstash in order to ingest that field as a date.

You need to understand what index/data stream you are writing to and the pros / cons of using your naming and M=mapping vs using the OOTB / Default.

I am not sure if you are following some other "How To" etc... but it seems a lot of parts of the information is missing...

bobus · December 20, 2023, 9:24pm

I'm sending logs to a Graylog endpoint, that's the whole idea. To enable logging for Kubernetes via Graylog, which internally uses Elasticsearch. This output section achieves that. "logstash" is therefore rather misleading there. Graylog receives the data and passes it on to Elasticsearch.

I'll be (of course) happy to give you any info you think is important, just ask me.

How can I set up a mapping/template for an index that doesn't yet exist? Indices are being created fresh automatically every midnight. What is the method? And how do I know in advance what the correct mapping needs to be?

Yesterday I wrote that the _message_timestamp parsing problem got fixed with your advise. This morning the exact same problem re-appeared, even though the exact same filebeat.conf is in effect. This is so frustrating. That field is mapped as "date", so what #(*&^)$ is wrong....

stephenb · December 20, 2023, 9:37pm

Ahh, another piece of the puzzle...

That is exactly what templates are for... a template provides a mapping for when a new index is created... I gave you the docs to the template above...

I think you think that the filebeat conf is controlling all this but you have filebeat in the mix... graylogs magic logstash ... elasticsearch .... not sure what the index name is, how the mapping is being created etc..

I do not know how graylog actually works.

Do you actually have access to the elasticsearch instance?

Sooo perhaps you might want to connect with your graylog folks...

Ands yike! from Graylog Documentation. Elasticsearch 7.10.2 is ancient.... and if you are using Opensearch you should probably check in with them.

Warning: Graylog 5.2 is the last version to include support for Elasticsearch (7.10.2 only)! We recommend you use OpenSearch 2.x as your data node to support Graylog 5.2.

OR Elasticsearch 7.10.2

I think I helped where I can... Have you considered just using the Elastic SIEM?

system · December 20, 2023, 9:39pm

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

bobus · December 22, 2023, 5:57pm

My setup has several pieces because this is how these things work by design: as stacks of several bricks. I didn't choose it this way.

Thanks for all your time, I fully appreciate it.

And Season's Greetings.

stephenb · December 22, 2023, 7:28pm

Hi @bobus

You know you can simply drop that field with a drop processor in filebeat temporarily to get the data flowing...

Your so close!

bobus · December 22, 2023, 11:32pm

So, I added this processor as the last one on the list:

  - drop_fields:
      fields: ["_message_timestamp"]

...and I still get lots of these:

[4]: index [int__0], id [a959ee61-a121-11ee-9f83-0242c0a8b007], message [OpenSearchException[OpenSearch exception [type=mapper_parsing_exception, reason=failed to parse field [_message_timestamp] of type [date] in document with id 'a959ee61-a121-11ee-9f83-0242c0a8b007'. Preview of field's value: '2023-12-22 23:27:15.781']]; nested: OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=failed to parse date field [2023-12-22 23:27:15.781] with format [strict_date_optional_time||epoch_millis]]]; nested: OpenSearchException[OpenSearch exception [type=date_time_parse_exception, reason=date_time_parse_exception: Failed to parse with all enclosed parsers]];]

The beast just won't die.

dadoonet · December 23, 2023, 6:05am

I don't get it. Are you running on OpenSearch? Initially you mentioned Elasticsearch.

bobus · December 23, 2023, 11:51am

Yes we switched just two days ago. One because there's no point in insisting on an very old version of ES when it's going away anyway very soon for Graylog, and two because perhaps the problem would be easier to tackle with OS.

And the funny thing is, the first day, the problem was not even there and it was a huge relief. The second day (that was yesterday) it came back. It may have something to do with the dynamic mapping that depends on the 1st entry that comes just after midnight or something along those lines.

In the index of the 1st day, _message_timestamp was mapped as "keyword". In the index of the 2nd day, it was "date".

The funny thing is, yesterday I set up filebeat to output to a local file rather than send to Graylog, and checked the file, and could not find a "timestamp" within the "message" JSON block, I only found a "@timestamp". At least in the few records I checked.

system · January 20, 2024, 1:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.