Python bulk insert customize date parser

Hi all,

I'm new to Elasticsearch and I'm trying to bulk-insert into Elasticsearch using python
After I read in the data using pandas, I created an index:

es = Elasticsearch()
es.indices.create(
        index="cars",
        mappings={
            "properties": {
                "manufacture_year": {
                    "type": "date",
                },
                "engine_displacement": {
                    "type": "date",
                },
                "date_created": {
                    "type": "date",
                    "format": ["yyyy-MM-dd HH:mm:ss.SSSSSSZ"]
                },
                "date_last_seen": {
                    "type": "date",
                    "format": ["yyyy-MM-dd HH:mm:ss.SSSSSZ"]
                }
            }
        }
    )

But I got errors when I was bulk-inserting and the errors were on date_created and date_last_seen
date_created sample: 2015-11-14 18:10:06.838319+00
date_last_seen sample: 2016-01-27 20:40:15.46361+00

'type': 'illegal_argument_exception', 'reason': 'failed to parse date field [2015-11-14 18:10:06.838319+00] with format [[yyyy-MM-dd HH:mm:ss.SSSSSSZ]]', 'caused_by': {'type': 'date_time_parse_exception', 'reason': "Text '2015-11-14 18:10:06.838319+00' could not be parsed, unparsed text found at index 0

what is that "unparsed text found at index 0"? I can't find any errors at index 0

Best Wishes
Steven Zeng

If I clean the data more and set the mappting to:

"date_created": {
                        "type": "date",
                        "format": ["yyyy-MM-dd HH:mm:ss"]
                        # "format": ["yyyy-MM-dd HH:mm:ss.SSSSSSZ"]
}

and cut the part after "." when yeilding the json data:

        j["date_created"] = str(j["date_created"]).split(".")[0]

everything will work fine, But I do want to keep the last part, what should I do?

If I create an index directly in Elasticsearch

PUT /test
{
  "mappings": {
    "properties": {
      "date_created":{
        "type":"date"
        , "format": ["yyyy-MM-dd HH:mm:ss","yyyy-MM-dd HH:mm:ss.SSSSSSZ"]
      }
    }
  }
}

both posts works:

POST /test/_doc
{
  "created_date":"2015-11-14 18:10:06.838319+00"
}

and

POST /test/_doc
{
  "created_date":"2015-11-14 18:10:06"
}

It seems to me the only solution now is to write the request code myself, hopefully the bug in python client will be fixed soon.

This looks weird to me. I'm not a Python expert.

Looking at the logs, it sounds like you are doing the same thing in Python and in Kibana Dev Console.

Could you create a full reproduction of this, with:

  • The Kibana Dev Console script that you shared. Just add the DELETE test in front of it
  • A full Python script which reproduces the exact same steps, and not using bulk.

And then report at GitHub - elastic/elasticsearch-py: Official Elasticsearch client library for Python?

Unless @sethmlarson knows what is happening? :wink:

Thanks,I've already reported it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.