Mapper Parsing Exception for field with date data that I want to index as a string


(David McClain) #1

In some of my logs, I get a message body that is just a large json object. In some of those json objects, I have a field coming in that has data in the format of YYYY/MM/DD
E.g.

"param_TRANSACTION_STRINGTEST"=>"2017/01/27"

The json filter seems to be parsing it just fine.
This is not the field I am using for the date of the log
As such, I don't really care whether or not its indexed as a date type. However Elasticsearch seems to be determined to index it as a date, and since apparently 'YYYY/MM/DD' is an invalid date format, I get the Mapper Parsing Exception:

"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [param_TRANSACTION_DATE]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2017/01/27\" is malformed at \"/01/27\""}}}},

Currently the only way I've been able to solve this is by going and adding an explicit date conversion block on this field - forcing it to become a date, and telling it what format it is.

However, this is impractical as the fields that I'm receiving in the JSON object are not a static set of fields, I could randomly start getting a new field with a new name, with another date-like string in it.

I can't figure out what is causing Elasticsearch to be so dead-set on reading this in as a date however, so I don't know how to change the default behavior.

Just to clarify - there are a few other threads out there I've found where the default response is 'use a date filter to convert it to a date', and yes I understand that is a solution, but please keep in mind that's not my issue here - this isn't my primary date field. I want this to be a string. In fact, I'd prefer any date fields that come through in the JSON object to be strings and then later, if I want to use them as date objects, that is when I would want to go back and make a one-off case of converting it to a date.


(David McClain) #2

Just some update info:

I've been doing some basic testing on inserting data, and it looks like Elasticsearch is saying 'that looks like a date, so I'm going to treat it like a date'. Which is all fine and good. So...
Where is that happening? Is that in the code? Is it in a config?

I've been playing with the idea/philosophy of "It's a date, so why not treat it like one", which is probably the same idea behind why Elasticsearch is doing what it's doing. So another follow-up question would be:
Is there a way that I can dynamically handle these occurrences? I'm trying to stay as far away as I can from micro-managing these fields as they come in. I think it's surprising me that instead of defaulting to index the field as a string, Elasticsearch is instead rejecting the field and causing an index error.


(Igor Motov) #3

Which version of elasticsearch is it? How does the mapping for this field look like?


(David McClain) #4

ES Version is 2.3.4
I did a GET request for _mapping/logs and originally there didn't seem to be any mapping for the specific field 'param_TRANSACTION_DATE' in that listing.

I also couldn't really find anything in the dynamic section..

Since then I did a few tests using a mutate on the string to remove the forward slashes, as well as doing a date conversion in Logstash using the date{} block, and both of those allowed it to be successfully indexed.

My next test was to create a brand new field in Logstash and test that way.

    add_field => { "testField" => "Test/Data" }
    add_field => { "testField2" => "2016/01/02" }

I tried both of these individually, and testField worked fine without any complaint from ES.
When I tried the 2nd - testField2, it errored again. This is a brand new field name and does not exist in the mappings even now.

Here is the dynamic section of my mapping:

"dynamic_templates": [
               {
                  "message_field": {
                     "mapping": {
                        "fielddata": {
                           "format": "disabled"
                        },
                        "index": "analyzed",
                        "omit_norms": true,
                        "type": "string"
                     },
                     "match": "message",
                     "match_mapping_type": "string"
                  }
               },
               {
                  "string_fields": {
                     "mapping": {
                        "fielddata": {
                           "format": "disabled"
                        },
                        "index": "analyzed",
                        "omit_norms": true,
                        "type": "string",
                        "fields": {
                           "raw": {
                              "ignore_above": 256,
                              "index": "not_analyzed",
                              "type": "string",
                              "doc_values": true
                           }
                        }
                     },
                     "match": "*",
                     "match_mapping_type": "string"
                  }
               },
               {
                  "float_fields": {
                     "mapping": {
                        "type": "float",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "float"
                  }
               },
               {
                  "double_fields": {
                     "mapping": {
                        "type": "double",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "double"
                  }
               },
               {
                  "byte_fields": {
                     "mapping": {
                        "type": "byte",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "byte"
                  }
               },
               {
                  "short_fields": {
                     "mapping": {
                        "type": "short",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "short"
                  }
               },
               {
                  "integer_fields": {
                     "mapping": {
                        "type": "integer",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "integer"
                  }
               },
               {
                  "long_fields": {
                     "mapping": {
                        "type": "long",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "long"
                  }
               },
               {
                  "date_fields": {
                     "mapping": {
                        "type": "date",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "date"
                  }
               },
               {
                  "geo_point_fields": {
                     "mapping": {
                        "type": "geo_point",
                        "doc_values": true
                     },
                     "match": "*",
                     "match_mapping_type": "geo_point"
                  }
               }

(Igor Motov) #5

I think this might be the same issue as of https://github.com/elastic/elasticsearch/pull/22174

I tested it with 5.3 and master and it looks like it's fixed.

As a workaround you can add the explicit date format to the dynamic mapping for the date field:

                "date_fields": {
                    "mapping": {
                        "type": "date",
                        "format": "yyyy/MM/dd",
                        "doc_values": true
                    },
                    "match": "*",
                    "match_mapping_type": "date"
                }

(David McClain) #6

Ok great. Thanks.
I'll try adding that.
Quick follow-up, is there any in-depth documentation on how the mapping configs work? I'm wanting to be able to find the answers to some of my question myself, but I've had a hard time learning the specifics on what changes to the mapping template causes what effects. I'm wanting to know a little more about what setting the explicit format field will do. What kind of negative effects could come of it?
Is there a chance it could dynamically identify dates of other formats, and then fail to parse again because it has a different hard-coded date format? Does the format field there take multiple values? And if so, in what format?


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.