How to split a pure string like YYYYMMDDHHmm and convert into timestamp?

Hi, I'm currently using logstash to get some string which in a format of YYYYMMDDHHmm.
That's a kind of pure "time" string which seems hardly to map by grok filter.
I have searched solution for a long time but none of the cases before matched my problem.
So I start a new topic for help. Thank you!

You can use the date filter, like e.g.

filter {
    date {
        match => ["old_field","YYYYMMDDHHmm"]
        target => "new_field"
    }
}
1 Like

Hi paz,
First of all, thank you for answering.

I have tried the date filter before like this:

date{
match => ["starttime","yyyyMMddHHmm"]
target => "@timestamp"
}

And look into JSON files in KIBANA I saw this:

"_source": {
"@timestamp": "2017-07-17T03:56:05.690Z",
"@version": "1",
"starttime": 201707171048,
},
"fields": {
"@timestamp": [
1500263765690
]
}

So the @timestamp in _source seemed not match to my original string on Hour and Minute.
Also, I'm trying to over-write @timestamp in fields but it didn't work.

How should I solve this?

Thank you!

Also, I noticed that there showed:

"tags": [
"_dateparsefailure"
]

in _source field.
So did I do something wrong on the filter?

Actually, why I want to fill my data time to @timestamp is that I will query lots of times from database every day. But every time I can just query a fixed amount of data by logstash config. So I hope logstash know which data has been imported by configuring the @timestamp, to avoid duplex importing.

Yeah, a _dateparsefailure tag means the filter failed to properly parse the date on the original field, so the @timestamp field got the default value (the time when the log was received).

I tried your posted example (config and sample data) and it works as intended. Are you sure your starttime field is a string? If not, you'll have to convert it to one before the date filter with

    mutate {
        convert => {
            "starttime" => "string"
        }
    }

Also, keep in mind that you may need to set a timezone parameter, because by default it will be converted to UTC, so you will get an offset based on your timezone.

1 Like

Hi paz,
Thank you for your method. Maybe my data format is strange, I didn't succeed through your way, but I found a similar way to get it:

filter {
	mutate{
		convert => { "starttime" => "integer" }
		add_field => { "starttime1" =>  "%{starttime}00" } 
		convert => { "starttime1" => "integer" }
	}
	date{
		match => ["starttime1","yyyyMMddHHmmss"]
		timezone => "Asia/Hong_Kong"
		target => "@timestamp"
	}	
}

It seems that when mapped from integer, @timestamp requests an accuracy to second.
Anyway, you helped me to solve the mapping problem to @timestamp. Thank you!

BTW, I found that though I have set my data time to @timestamp, ELK still won't distinguish duplicated data for a same @timestamp.
Since I have to query a fixed amount of data for one time, let's say 1000 data per time. For the next 1000 data, there may be some duplicate data (maybe 200 ~ 500 but not sure each time) which already imported before, and I can't control that.
So may I ask that is there any method to drop data which already imported? Thank you!

Just to be clear, you want to avoid duplicates when reading from ES with Logstash or when writing to ES?

Actually, either is OK for me. Cuz I just don't want to show duplicate data, no matter it is stored but not showed, or avoid from importing.
But while I'm using KIBANA to visualize data, I didn't find any way to avoid duplicated show. So I'm trying on avoid duplicate writing currently.

Thank you!

In order to avoid duplicate data insertion, the easiest way would be to provide your own id field for the documents instead of letting ES generating a random one upon insert (as I suspect it is the case now). The id field is what ElasticSearch takes into consideration when checking if the document already exists.

So you can use a unique value per document (timestamp? timestamp + some_other_field? It depends on the actual data) and provide it yourself in the Logstash elasticsearch output.

When you have than, you can toy around with the action of the output plugin.

  • index will discard the already indexed document but will also return an error, potentially flooding your logstash logfile with error codes.
  • upsert will silently overwrite the old document with the new one (bumping it's version in the process), but since I presume the data are identical, it may not be an issue.

That way you'll be certain each document exists only once, since each id can exist only once.

1 Like

I think I got your point and feel that's the answer I want!
Thank you very much!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.