AWS Elasticsearch : mapper_parsing_exception for type date from Kinesis Firehose using index_template

Hi all,

I hope someone can help me. I am really new to Elasticsearch so any guidance would be really appreciated.

I have been battling with AWS Elasticsearch and Kinesis FIrehose Agent (this reads in a application log file) and i think I think I am almost there but I have hit a blocker.

> {"type":"mapper_parsing_exception","reason":"failed to parse field [timestamp] of type [date] in document with id \u002749606120302139348740181849332896261353140434064317087746.0\u0027. Preview of field\u0027s value: \u00272020-04-16 17:48:25,839\u0027","caused_by":{"type":"illegal_argument_exception","reason":"failed to parse date field [2020-04-16 17:48:25,839] with format [strict_date_optional_time||epoch_millis]","caused_by":{"type":"date_time_parse_exception","reason":"Failed to parse with all enclosed parsers"}}}

The data in question is 

2020-04-28 19:48:25,244|OAuth| 1234567890| 127.0.0.1 | | pa_customer| OAuth20| localhost| AS| success| ProductHoldingAFM2FA| | 2596

The error message hints at \u0027 which i think is an apostrophe but the log has no aprostrophe in it?

All the data from the kinesis comes with data types defaulting as text i changed it in elasticsearch with the following template.

` PUT /_template/testfed-t01
{
"index_patterns": [
"test*"
],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {

      "_source": {"enabled": true}, 
      "properties": {
      "action": {"type": "text"}, 
      "authenticationtype": {"type": "text"}, 
      "device": {"type": "keyword"}, 
      "duration": {"type":"integer"}, 
      "hosttype": {"type": "text"}, 
      "ipaddress": {"type": "keyword"}, 
      "message": {"type":"text"}, 
      "providertype": {"type": "text"},
      "result": {"type": "text"}, 
      "timestamp": {
        "type": "date",
        "format":"yyyy-MM-dd HH:mm:ss,SSS"
      }, 
      "typemfa": {"type":"text"}, 
      "unknown1": {"type": "text"},
      "unknown2": {"type": "text"}

    }
  }
}`

My data is structured as:

> 2020-04-28 19:48:25,839|OAuth| 1234567890| 127.0.0.1 | | pa_customer| OAuth20| localhost| AS| success| | | 521 > 2020-04-28 19:38:25,839|OAuth| 1234567890| 127.0.0.1 | | pa_customer| OAuth20| localhost| AS| success| | | 521

My kinesisagent.json looks like :

{ "checkpointFile": "/opt/aws-kinesis-agent/run/checkpoints", "cloudwatch.endpoint": "https://monitoring.eu-west-2.amazonaws.com", "firehose.endpoint": "firehose.eu-west-2.amazonaws.com", "awsAccessKeyId": "AKIFREDGJAMDNDREMSJ", "awsSecretAccessKey": "2xxsx4TDEs34WQ0UaMpFHwu4h+FAKEF8VxedtPMADZ", "flows": [ { "filePattern": "/data/applicatee-10.0.0/applicatee/log/testaudit01.log", "initialPosition": "START_OF_FILE", "deliveryStream": "TEST-APPlicatee-AuditLog-Stream", "dataProcessingOptions": [ { "optionName": "CSVTOJSON", "customFieldNames": [ "timestamp", "action", "unknown1", "ipaddress", "unknown2","device", "authenticationtype", "hosttype","providertype", "result", "typemfa", "message", "duration" ], "delimiter": "\\|" } ] } ] }

Any help would be greatly appreciated.

Thanks again in advance for your time and help.

yoyomonkey

I would suggest to remove your awsSecretAccessKey from the post.

Given the error message, it seems the timestamp format is not used.
Possible reasons:

  • You added the index template after the index got created (you cannot change the mappings of existing fields once the index get created)
  • You are writing to an index which doesn't start with test

Sorry @yoyomonkey

I can help on Elasticsearch issues.

Testing on 7.x, I manage to index the data:

PUT /_template/testfed-t01
{
  "index_patterns": [
    "test*"
  ],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "_source": {
      "enabled": true
    },
    "properties": {
      "action": {
        "type": "text"
      },
      "authenticationtype": {
        "type": "text"
      },
      "device": {
        "type": "keyword"
      },
      "duration": {
        "type": "integer"
      },
      "hosttype": {
        "type": "text"
      },
      "ipaddress": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      },
      "providertype": {
        "type": "text"
      },
      "result": {
        "type": "text"
      },
      "timestamp": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss,SSS"
      },
      "typemfa": {
        "type": "text"
      },
      "unknown1": {
        "type": "text"
      },
      "unknown2": {
        "type": "text"
      }
    }
  }
}

POST test123/_doc/1
{
  "timestamp": "2020-04-16 17:48:25,839"
}

GET test123/_search
{
  "docvalue_fields": [ { "field": "timestamp", "format": "strict_date_optional_time" }]
}
# Result
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test123",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2020-04-16 17:48:25,839"
        },
        "fields" : {
          "timestamp" : [
            "2020-04-16T17:48:25.839Z"
          ]
        }
      }
    ]
  }
}

The AWS Elasticsearch console shows an error which clearly states the timestamp field expects a type date with a format strict_date_optional_time||epoch_millis.
But your index template seems correct.

Maybe AWS forces you to use a specific time format?

Are there other Index Templates which might come in priority in addition to your index template?

Index Templates are merged if multiple match the index name and they respect the order parameter (documentation).

Can you try :

  • GET test*/_mapping to get the actual mappings of the indices which got created
  • GET /_template/ to get the list of all the templates you have on your cluster

Cheers Luca for the quick reply and help sir,

I did completlely change the secret key before posting it .

To comment on your points:

I deleted the indexes first so there was nothing in elastic search before creating the template.
I created the index template which was acknolwedged then i generated the data from the logfile which kinesis read and streamed to elasticsearch as the test _application-yymmdd.

If the index had already been created and i created the template i think the error message would be different to the issue being the mapping which is what is reported ?

In the AWS elasticsearch console it shows my index template worked as the datastream has the right datatype attached just in the error the points to a mapping in the timestamp field?

> {"type":"mapper_parsing_exception","reason":"failed to parse field [timestamp] of type [date] in document with id \u002749606120302139348740181849332896261353140434064317087746.0\u0027. Preview of field\u0027s value: \u00272020-04-16 17:48:25,839\u0027","caused_by":{"type":"illegal_argument_exception","reason":"failed to parse date field [2020-04-16 17:48:25,839] with format [strict_date_optional_time||epoch_millis]","caused_by":{"type":"date_time_parse_exception","reason":"Failed to parse with all enclosed parsers"}}}

What is confusing me is :

"reason":"failed to parse date field [2020-04-16 17:48:25,839] with format [strict_date_optional_time||epoch_millis]

I added in format in my index template which should work.
"format":"yyyy-MM-dd HH:mm:ss,SSS" as that matches the timestamp in my log file.

Hi Luca thank you so much I tried those commands and for

`GET /_template/`

{
  "test-t01" : {
    "order" : 0,
    "index_patterns" : [
      "test*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "number_of_replicas" : "1"
      }
    },
    "mappings" : {
      "_source" : {
        "enabled" : true
      },
      "properties" : {
        "ipaddress" : {
          "type" : "keyword"
        },
        "hosttype" : {
          "type" : "text"
        },
        "unknown2" : {
          "type" : "text"
        },
        "unknown1" : {
          "type" : "text"
        },
        "typemfa" : {
          "type" : "text"
        },
        "authenticationtype" : {
          "type" : "text"
        },
        "message" : {
          "type" : "text"
        },
        "duration" : {
          "type" : "integer"
        },
        "result" : {
          "type" : "text"
        },
        "action" : {
          "type" : "text"
        },
        "device" : {
          "type" : "keyword"
        },
        "providertype" : {
          "type" : "text"
        },
        "timestamp" : {
          "format" : "yyyy-MM-dd HH:mm:ss,SSS",
          "type" : "date"
        }
      }
    },
    "aliases" : { }
  }

And for

GET test*/_mapping

{
"test_appl_auditlog-2020-04-28" : {
"mappings" : {
"properties" : {
"action" : {
"type" : "text"
},
"authenticationtype" : {
"type" : "text"
},
"device" : {
"type" : "keyword"
},
"duration" : {
"type" : "integer"
},
"hosttype" : {
"type" : "text"
},
"ipaddress" : {
"type" : "keyword"
},
"message" : {
"type" : "text"
},
"providertype" : {
"type" : "text"
},
"result" : {
"type" : "text"
},
"timestamp" : {
"type" : "date"
},
"typemfa" : {
"type" : "text"
},
"unknown1" : {
"type" : "text"
},
"unknown2" : {
"type" : "text"
}
}
}
}
}

In the mapping it does not show the format for the timestamp field is this correct?

No, it should show something like:

{
  "test123" : {
    "mappings" : {
      "properties" : {
        "action" : {
          "type" : "text"
        },
        "authenticationtype" : {
          "type" : "text"
        },
        "device" : {
          "type" : "keyword"
        },
        "duration" : {
          "type" : "integer"
        },
        "hosttype" : {
          "type" : "text"
        },
        "ipaddress" : {
          "type" : "keyword"
        },
        "message" : {
          "type" : "text"
        },
        "providertype" : {
          "type" : "text"
        },
        "result" : {
          "type" : "text"
        },
        "timestamp" : {
          "type" : "date",
          "format" : "yyyy-MM-dd HH:mm:ss,SSS"
        },
        "typemfa" : {
          "type" : "text"
        },
        "unknown1" : {
          "type" : "text"
        },
        "unknown2" : {
          "type" : "text"
        }
      }
    }
  }
}
1 Like

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.

Not related to your problem but did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, APM, Logs UI, Infra UI, SIEM, Maps UI, AppSearch and what is coming next :slight_smile: ...

1 Like

Cheers Dadoonet,

My mistake and my apologies I did try and get the formatting but it went all wrong for some of the bits in the posting.

cheers

Hi Dadoonet,

Thanks for the links. I did take a look at this cloud service in the past but I personally wanted to learn more about the technology and learn it myself. I am in the Manchester UK area which has an active Elastic group meetup so I wanted to hopefully pick up a new skill and meet new people.
cheers

1 Like

Hi Luca,

Cheers for clearing that up ... I cannot understand why the index tempalte did not update. Maybe because it already ready in the index into elasticsearch and could not be changed so still had the older format.

I think i will delete the index again correct the template and confirm i get the right formatting and put the index backin.
(I am sure there is probably a far better/easier way to do it ... which I hope to work out soon :slight_smile: ) but being a noob on this i think i can use the practice as its not live.

Cheers Luca I will give it a go and let you know how i get on.

How does it differ from AWS service in term of learning curve?

Great question Dadoonet I will try and give an honest answer: )

So for me personally very hard but thoroughy enjoyable you (must be prepared to invest allot of time).

My only exposure to ELK was a few years ago when i built one with a colleague on Azure using the Marketplace template and using that as a guide. ( I am a total ELK noob but the cloud infrastructure was fairly straightforward) I was working with a good friend who attended the ELK meetups and I think he had some help from one of the elk guys when he got stuck. (In this company we did first look at Splunk but when we saw the costs the business just would not signoff on it so we decided to put in the time and give ELK a go.

We got there in the end but as I my friend done allot of the indexing etc (we were using logstash as well), the documentation, guides and help from the elk master he knew we were lucky and have someting which works. I did pick up a few bits but I didnt fully understand why I was doing what i was doing but it was fairly straightforward.

Fast forward to today,,, I am in a new company and doing this project by myself this project was allot harder.... not impossible but I had to accept that i would have to read allot of guides, watch youtube videos, learn and try to understand the different terminologies in ELK (which I am still struggling with) and get involved in the community and try and ask the right question. I was prepared with the challenge so i did spend the past couple of weeks trying to get a better understanding.

Some of my challenges:

We didn't want to use logstash or any of the plugins like beats etc becuase we wanted an ingestor that can scale on demand in the cloud easily ..so we wanted to try kinesis firehose.

The AWS Kinesis Firehose Agent in places was poorly documented and it tooks some time of trial and error to get the agent to work from a on -premise machine.
When i finally got the data into elasticsearch...the mappings that appear in the elasticsearch _doc by default are all text and cannot be changed except when creating an index template. The step by step documentation shows how you get the data in and add it to elasticsearch from kinesis with little coverage of the firehose agent but by using the default apachelogs as test all fields were coming in as text and the tutorials did not show how to change the type unless you were prepared to use a lambda function . (I did not want that extra processing just for changing a type)

Allot of the guides show the index template before 7.0 .so when things did not work and the messaging wasn't coherant for a newb like me to understand it took a while and allot of reading and trial and error to figure out .... One of my biggest issues was trying to create an index_template for a indice that already existed and with the version 7.0 and the _type being removed (having a single type per index?) very little documentation tutorials this was a challenge with my limited understanding.

I did the tutorials, followed videos and uploaded sample data (some things woreds as allot of the videos and guides were a couple of years old ) and i played in the dev console allot but the leap to getting my kibana stream working did involve allot of patience and time.

I am now very close to getting my data in the right template in elasticsearch and I have really enjoyed this first step in the journey and Its someting I want to continue in.

As a recommendation : if anyone is looking for quick set up and no pain, or working where time is limited. I would strongly recommend the elastic cloud option they simply take the headache away and your data is ready to go ! If i didnt have an understanding employer I would definetely go down the elastic cloud route.

If you are prepared to put in allot of time and have the genuine want/interest to learn elastic, put in allot reading, watching videos, breaking stuff and making allot of mistakes and getting back up and trying again give it a go. I really am enjoying it. I know my environment is nothing advanced like what the elasticcloud will offer but i hope over time my skills, knowledge i gain from going to meetups, visiting this site and reading that as my business needs change I will be able to improve my solution over time.

HI Luca that is fantastic now that i know that the type had to be there. I deleted the index and then i tried to update the template and everything is now working .. the index is now avaialble to be selected in kibana and i can add the timestamp as a field.

Thank you so much for the help Luca .

My next challenge is looking now at whether i need a @timestamp (generated by elk) or will my single timestamp be enough and how do i generate that in this AWS elasticsearch and also if i can create by default a better looking id as the _id i have for records in 49606120302139348740181868583902339519856448043483136002.0 . So i am going to jump back on the google and have some fun finding the answer.

Infact before i do that I think I am going to spend the rest of the day making some visualisations i have been waiting for weeks to get here and play with kibana for a bit.

Cheers again .

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.