Incident:
I'm currently working on a log pipeline which forwards the logs from an API - of our Anti-Virus solution - to Graylog and therefore Elasticsearch behind that. However, I encountered a very strange mapper-parsing exception (which means that Elasticsearch couldn't parse the input to the expected datatype). I'll attach two examples of the same problem right here:
ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [file_attribute_ids] of type [long] in document with id 'e34ccd28-2a22-11ee-90d7-0242ac100112'. Preview of field's value: '[5, 12]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[5, 12]"]];
ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [connection_src_port] of type [long] in document with id 'e1582915-2a22-11ee-90d7-0242ac100112'. Preview of field's value: '63318, 63320']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "63318, 63320"]];
So somehow Elasticsearch has a problem parsing these lists to longs. As far as I understood, Elasticsearch also supports multiple values in a field from the ground up, so this list of longs shouldn't be a problem at all ...
Environment:
OS Information: Linux 5.15.0-73-generic
Package Version: Graylog 5.1.1+ef1b993 on graylog (Eclipse Adoptium 17.0.7)
What steps have you already taken to try and solve the problem?
Tried to parse every value in both fields as explicit integers. No success.
Transformed the datatype of these fields to String. Works until now without any more exceptions.
How can somebody help?
Please clarify whether this is a bug by Elasticsearch / Graylog or I'm just using these products in an inappropriate way. Thank you very much for your time!
Opster_support
(Elasticsearch community support @ Opster)
2
The issue you're encountering is not a bug in Elasticsearch or Graylog, but rather a mismatch between the data type defined in your Elasticsearch mapping and the actual data you're trying to index.
In Elasticsearch, a field can indeed hold multiple values, but the entire array needs to be of the same data type. In your case, the fields file_attribute_ids and connection_src_port are defined as long in your mapping, but you're trying to index arrays of longs (e.g., [5, 12] and 63318, 63320), which is causing the mapper parsing exception.
To fix this issue, you need to ensure that the data you're indexing matches the data type defined in your Elasticsearch mapping. If you expect these fields to contain arrays of longs, you should adjust your log pipeline to transform these fields into arrays of longs before indexing the data into Elasticsearch.
Here's a general approach to fix this issue:
Adjust your log pipeline: Modify your log pipeline to transform the file_attribute_ids and connection_src_port fields into arrays of longs before forwarding the logs to Elasticsearch. This could involve parsing the fields as strings, splitting the strings into arrays, and then converting each element in the arrays to a long.
Update your Elasticsearch mapping: If you have control over the Elasticsearch mapping, you could also consider updating the mapping to expect arrays of longs for these fields. However, this would require reindexing your existing data.
Use a script to transform the data: If you can't adjust your log pipeline or update your Elasticsearch mapping, you could consider using an ingest node with a script processor in Elasticsearch to transform the data as it's being indexed. The script would need to parse the fields as strings, split the strings into arrays, and then convert each element in the arrays to a long.
Please note that the specific steps to implement these solutions would depend on the details of your log pipeline and your Elasticsearch setup. If you need more detailed instructions, please provide more information about your log pipeline and your Elasticsearch mapping.
Thank you very much for this rapid and helpful reply! Approach 1 instantly fixed my issues and you provided this to me in no time - I would give 5 of 5 stars for this experience.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.