MapperParsingException after migrating Java client from ES 2.3.3 to 5.5.0

Hi!

We have been using ES version 2.3.3 and a java process (using the BulkProcessor) on the same version on pre-prod to feed data to ES. However, we thought it would be best if we could migrate to 5.5.0 before going live, so we upgraded the libraries and installed a new ES instance with the brand new release.

I had to modify our connector (which fetches data from Oracle and converts it to java objects) to use PreBuiltTransportClient:
client = TransportClient.builder().settings(settings).build();
to
client = new PreBuiltTransportClient(settings);

Other than that there was no change on the connector code except for exception handling when calling Node.close(). The rest is exactly the same except for the new ES instance, which is an out of the box 5.5.0.

Once I started indexing documents to ES I saw errors on my log regarding date formatting:

message [MapperParsingException[failed to parse [ts_stop]]; nested: IllegalArgumentException[Invalid format: "2017-07-10 15:52:16.791" is malformed at " 15:52:16.791"]

The templates used for both instances are exactly the same, ts_stop (and other date fields) are defined as date type, with no format field. For debugging purposes I printed the documents right before adding them to the BulkProcessor and for both client versions the dates are printed with the same yyyy-MM-dd HH:mm:ss.SSS format. However, when looking at the JSON documents on the 2.3.3 ES version I can see that the dates are stored as yyyy-MM-ddTHH:mm:ss.SSSZ, which makes me think that either:

1 - BulkProcessor is formatting those dates before indexing them on ES 2.3.3 and on version 5.5.0 has changed its default behaviour and doesn't do that any more.

2 - ES is correctly processing those dates and reformatting them before indexing them on ES 2.3.3 (unlikely, as trying to PUT them with curl fails with the same error) and ES 5.5.0 has changed its default behaviour and cannot process that date format.

If I add my custom date formatting using the format field it works, but I would like to understand why this is happening and not take the "It's magic" explanation; whether it is a change on ES/client libraries from version 2.x.x to 5.x.x , a bug in ES 5.x, or if it isn't meant to work at all in ES 2.x and it was a "fortunate" bug of the old version. Could somebody shed some light on this, please?

Thank you very much for your help in advance,
Michael.

It's not related to the bulk processor IMO.

If you just do that from the Kibana Console and try to index a date 2017-07-10 15:52:16.791 in a date field, I'm pretty sure, you'll get the same error.

You probably need to change the mapping you have been using and force the date format for your ts_stop field.

In 5.x series it uses: Date datatype | Elasticsearch Reference [5.5] | Elastic

strict_date_optional_time

Which means: ISODateTimeFormat (Joda-Time 2.12.5 API)

In this format T is needed to separate the date and the time part.

You can do this:

PUT my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "ts_stop": {
          "type":   "date",
          "format": "yyyy-MM-dd HH:mm:ss.SSS"
        }
      }
    }
  }
}

HTH

Hi David,

Thanks for your reply!

Yes, using the format field like you suggested works, and it is the approach we are going with. And, like you said, if I try to post something (I tried curl, but as proof of concept it works) with that format it fails.

However, on the 2.3.3 client version the BulkProcessor index requests are working (whereas curl doesn't)! Which is why I'm scratching my head, because it should either work/fail on both, or the behaviour must have changed from 2.x to 5.x.

Cheers,
Michael.

I don't know how you are generating the source JSON document from your beans but may be there is a difference here?

If not, can you check what the mapping is in your 2.3 cluster and paste it here?

I don't know how you are generating the source JSON document from your beans but may be there is a difference here?
No, this part of the code hasn't changed. The way it is done is by adding the java objects (Strings, ArrayLists, Dates etc.) in a Map. This Map is the final document and that's what is added to the processor:

for (Map<String, Object> document : documents) { [...]//other stuff like creating the index name, logging etc. IndexRequest request = new IndexRequest(localIndexName, type, (String) document.remove(DBConstants.ID_KEY)); request.source(document); bulkProcessor.add(request);

If not, can you check what the mapping is in your 2.3 cluster and paste it here?
I will paste the relevant mappings only, because it's enormous:

 "ts_start" : {
        "type" : "date",
        "format" : "strict_date_optional_time||epoch_millis"
      },
      "ts_stop" : {
        "type" : "date",
        "format" : "strict_date_optional_time||epoch_millis"
      }
 "check_in_date" : {
        "type" : "date",
        "ignore_malformed" : true,
        "format" : "strict_date_optional_time||epoch_millis"
      },
      "check_out_date" : {
        "type" : "date",
        "ignore_malformed" : true,
        "format" : "strict_date_optional_time||epoch_millis"
      },

Which, as far as I understand, is the default format (yyyy-MM-ddTHH:mm:ss.SSSZ)

And this is my template:

     "check_out_date" : {
        "ignore_malformed" : 1,
        "type" : "date"
      },
    "check_in_date" : {
        "ignore_malformed" : 1,
        "type" : "date"
      },
    "ts_start" : {
        "type" : "date"
      },
      "ts_stop" : {
        "type" : "date"
      },

Which is coherent because when there's no formatting, it should use the default.

I just tested the following on elasticsearch 2.4.3:

DELETE test
PUT test
{
  "mappings": {
    "doc": {
      "properties": {
        "ts_start": {
          "type": "date",
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}
PUT test/doc/1
{
  "ts_start": "2017-07-10 15:52:16.791"
}

It gives:

{
   "error": {
      "root_cause": [
         {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse [ts_start]"
         }
      ],
      "type": "mapper_parsing_exception",
      "reason": "failed to parse [ts_start]",
      "caused_by": {
         "type": "illegal_argument_exception",
         "reason": "Invalid format: \"2017-07-10 15:52:16.791\" is malformed at \" 15:52:16.791\""
      }
   },
   "status": 400
}

I did not test on 2.3.3 though. As you said, may be something as been fixed in the mean time?
Wanna try the same exact script I pasted on 2.3.3 and report?

I get the same as you.

$ curl -XPUT "http://x.x.x.x:x/test" -d '{ "mappings": { "doc": { "properties": { "ts_start": { "type": "date", "format": "strict_date_optional_time||epoch_millis" } } } }}'

$ curl -XGET 'http://x.x.x.x:x/test?&pretty'                                                                                                 {
      "test" : {
        "aliases" : { },
        "mappings" : {
          "doc" : {
            "properties" : {
              "ts_start" : {
                "type" : "date",
                "format" : "strict_date_optional_time||epoch_millis"
              }
            }
          }
        },
        "settings" : {
          "index" : {
            "creation_date" : "1499857893058",
            "uuid" : "UNSsDmRSSuaHlvwd6R2N5w",
            "number_of_replicas" : "1",
            "number_of_shards" : "5",
            "version" : {
              "created" : "2040499"
            }
          }
        },
        "warmers" : { }
      }
    }

And it crashes when trying to put a document (it also crashes with the same error if I try putting into my other indexes manually).
curl -XPUT "http://x.x.x.x:x/test/doc/1" -d'{"ts_start": "2017-07-10 15:52:16.791"}'

    {
        "error": {
            "root_cause": [{
                "type": "mapper_parsing_exception",
                "reason": "failed to parse [ts_start]"
            }],
            "type": "mapper_parsing_exception",
            "reason": "failed to parse [ts_start]",
            "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "Invalid format: \"2017-07-10 15:52:16.791\" is malformed at \" 15:52:16.791\""
            }
        },
        "status": 400
    }

I don't know, it feels like only once BulkProcessor handles the insertion it makes it into ES. When I printed the object (before calling bulkprocessor.add(document)) the dates have the non-default format (without T and Z)... so I can't think of any other place where that could have changed...

It may be worth stopping here with this item if you haven't got any other suggestions. I just want to make sure that there is no fundamental flaw in our setup or something we're missing that could potentially bite us when we go live.

I don't think bulk processor plays any role here.
I believe that something else is implied.

If you can reproduce it with a similar script I pasted, I'll be happy to look at it.

Ok, I've got the culprit. I've changed the 2.3.3 to print a document during several stages and even though at first I was printing before calling the add method, I was printing the doc before calling
request.source(document);
(request is an instance of IndexRequest).

If I print the object before calling source, the date looks like this:
ts_stop=2017-07-11 15:06:00.571

If I print it after source and before add, the date looks like this:
"ts_stop":"2017-07-11T15:06:00.571Z"

On version 5.5.0, unfortunately, toString() does not print the detail and instead I get this kind of output "{[index_name][type][doc_id], source[n/a, actual length: [3.9kb], max length: 2kb]}"

So you were correct, it is not the BulkProcessor, but apparently either IndexRequest or some of its referenced libraries must have changed for this to happen.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.