I'm sending log data from Filebeat (running on Kubernetes) to Graylog/Elasticsearch. I need to ensure that the date-time field inside the JSON message block that is part of the log entry has this format: 2023-12-11T23:23:22.000Z
The "message" JSON block is properly parsed and its fields extracted into new fields with the "_message" prefix, but the conversion does not take place and we still get the error from Elasticsearch.
In fact, when, just for testing, I replaced that kind-of-complex conversion with one that just changes the type of "kubernetes_pod_ip" from type="ip" to type="string" and then renames it to "kube_pod_ip", it still didn't do anything at all, it's as if my conversion block is ignored.
- convert:
fields:
- {from: "kubernetes_pod_ip", to: "kube_pod_ip", type: "ip"}
Please suggest a way out of this, we like Filebeat a lot, if only it can allow us to make such simple conversions...
Pretty sure that means your fields would look like
_message.timestamp
I would take a close look at the format of the output JSON and make sure you are referencing the fields correctly
You can output the JSON to the console and take a look... or in elasticsearch
Are you sure that is getting parsed correctly?
BUT there is no layout parameter with convert as far as I can tell... so not sure what you got that... so I do not think convert is what you are looking for.
It looks like you are trying to fix a date field....
I think for this specific use can you are looking for
Yes, I'm trying to fix a date field. I want the incoming field to comply with what Elasticsearch expects:
ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [_message_timestamp] of type [date] in document with id '6f8b0872-9db8-11ee-9950-0242ac1a0004'. Preview of field's value: '2023-12-18 15:16:35.781']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [2023-12-18 15:16:35.781] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];
strict_date_optional_time has a "T" between date and time. My data doesn't.
If "convert" is not the right path, please tell me what is.
Thanks a lot, what you suggested resolved that problem with _message_timestamp, yet now I have a similar problem with _message_ts:
ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [_message_ts] of type [date] in document with id '671e3b21-9e54-11ee-9950-0242ac1a0004'. Preview of field's value: '1.702979582618132E9']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [1.702979582618132E9] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];
So I applied the exact same fix as you suggested, using a 2nd timestamp processor:
...since "UNIX" is a standard format along with UNIX_MS, but the error persists.
It seems to me that, if I added a 3rd candidate format, using epoch_second, I'd be able to parse this field. But where are these alternative-allowed-formats defined?
Another important question: does this mean that the log entry does not get into the database? Or does it get in, but not fully parsed?
When I search that particular stream for all entries that contain the _message_ts field with "exists: _message_ts", I get zero results, so it seems all failed log entries are rejected, which is a no-no for us.
I don't know how the scientific notation came into play here.
In the index mapping, the _mapping_ts field is marked as "date". I did not manually set it. Interestingly, in yesterday's index from the same index set the mapping is "float".
Here is my current processor setup. As you can tell I'm getting a bit desperate with the _message_ts field.
If you do not provide a mapping ... elastic will "Guess" at the data type from the first value it sees... so if it sees something that looks like a float it picks a float if it look
For any real or production use case we always recommend defining a template / mapping.
So if you do not change the index name the data from filebeeat will be written into a filebeat data stream, which will come with a robust template
You did not share the output section of your config so I am just learning this now.
That will definitely not work... you need to read the docs closely, that the processor does take random layouts, etc...
That fields is NOT a timestamp of any type and will take significant work to convert if that is actually the value
I understand now what you mean by template, yes I'm aware of those. But as I wrote the _message_ts field is already mapped as "date", so what is the problem? Isn't "date" correct?
But how can I add a template with a mapping, how do I know in advance what it needs to look like?
I didn't share the output section because I don't have one, except perhaps this:
output.logstash:
hosts: ["10.65.82.185:5045"]
I added those "will not work" layouts after seeing that "UNIX" didn't work. I don't know how that 1.702979582618132E9 came up so I'm trying to cope with it. Perhaps the data is coming in as a float but is shown in a scientific notation by ES.
That is a very important output section.. your sending to logstash
So I feel like I am looking at our issue through a "porthole" and can't see the whole horizon / issue... I am just getting information piecemeal....
So the next what you logstash pipeline output section, what index are you writing to or data stream are you writing to? What does that output section look like.
In short ...
I think you need to back up a bit and understand the over concepts
You need to create a template / mapping as I mentioned before... otherwise, you do not have control over the data types / mapping for your index
You need to fix that data coming in; you will need to parse/convert it somewhere probably in logstash in order to ingest that field as a date.
You need to understand what index/data stream you are writing to and the pros / cons of using your naming and M=mapping vs using the OOTB / Default.
I am not sure if you are following some other "How To" etc... but it seems a lot of parts of the information is missing...
I'm sending logs to a Graylog endpoint, that's the whole idea. To enable logging for Kubernetes via Graylog, which internally uses Elasticsearch. This output section achieves that. "logstash" is therefore rather misleading there. Graylog receives the data and passes it on to Elasticsearch.
I'll be (of course) happy to give you any info you think is important, just ask me.
How can I set up a mapping/template for an index that doesn't yet exist? Indices are being created fresh automatically every midnight. What is the method? And how do I know in advance what the correct mapping needs to be?
Yesterday I wrote that the _message_timestamp parsing problem got fixed with your advise. This morning the exact same problem re-appeared, even though the exact same filebeat.conf is in effect. This is so frustrating. That field is mapped as "date", so what #(*&^)$ is wrong....
That is exactly what templates are for... a template provides a mapping for when a new index is created... I gave you the docs to the template above...
I think you think that the filebeat conf is controlling all this but you have filebeat in the mix... graylogs magic logstash ... elasticsearch .... not sure what the index name is, how the mapping is being created etc..
I do not know how graylog actually works.
Do you actually have access to the elasticsearch instance?
Sooo perhaps you might want to connect with your graylog folks...
Ands yike! from Graylog Documentation. Elasticsearch 7.10.2 is ancient.... and if you are using Opensearch you should probably check in with them.
Warning: Graylog 5.2 is the last version to include support for Elasticsearch (7.10.2 only)! We recommend you use OpenSearch 2.x as your data node to support Graylog 5.2.
OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.
(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )
So, I added this processor as the last one on the list:
- drop_fields:
fields: ["_message_timestamp"]
...and I still get lots of these:
[4]: index [int__0], id [a959ee61-a121-11ee-9f83-0242c0a8b007], message [OpenSearchException[OpenSearch exception [type=mapper_parsing_exception, reason=failed to parse field [_message_timestamp] of type [date] in document with id 'a959ee61-a121-11ee-9f83-0242c0a8b007'. Preview of field's value: '2023-12-22 23:27:15.781']]; nested: OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=failed to parse date field [2023-12-22 23:27:15.781] with format [strict_date_optional_time||epoch_millis]]]; nested: OpenSearchException[OpenSearch exception [type=date_time_parse_exception, reason=date_time_parse_exception: Failed to parse with all enclosed parsers]];]
Yes we switched just two days ago. One because there's no point in insisting on an very old version of ES when it's going away anyway very soon for Graylog, and two because perhaps the problem would be easier to tackle with OS.
And the funny thing is, the first day, the problem was not even there and it was a huge relief. The second day (that was yesterday) it came back. It may have something to do with the dynamic mapping that depends on the 1st entry that comes just after midnight or something along those lines.
In the index of the 1st day, _message_timestamp was mapped as "keyword". In the index of the 2nd day, it was "date".
The funny thing is, yesterday I set up filebeat to output to a local file rather than send to Graylog, and checked the file, and could not find a "timestamp" within the "message" JSON block, I only found a "@timestamp". At least in the few records I checked.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.