How to split a pure string like YYYYMMDDHHmm and convert into timestamp?

TaurusD · July 17, 2017, 10:30am

Hi, I'm currently using logstash to get some string which in a format of YYYYMMDDHHmm.
That's a kind of pure "time" string which seems hardly to map by grok filter.
I have searched solution for a long time but none of the cases before matched my problem.
So I start a new topic for help. Thank you!

paz · July 17, 2017, 12:45pm

You can use the date filter, like e.g.

filter {
    date {
        match => ["old_field","YYYYMMDDHHmm"]
        target => "new_field"
    }
}

TaurusD · July 18, 2017, 1:30am

Hi paz,
First of all, thank you for answering.

I have tried the date filter before like this:

date{
match => ["starttime","yyyyMMddHHmm"]
target => "@timestamp"
}

And look into JSON files in KIBANA I saw this:

"_source": {
"@timestamp": "2017-07-17T03:56:05.690Z",
"@version": "1",
"starttime": 201707171048,
},
"fields": {
"@timestamp": [
1500263765690
]
}

So the @timestamp in _source seemed not match to my original string on Hour and Minute.
Also, I'm trying to over-write @timestamp in fields but it didn't work.

How should I solve this?

Thank you!

TaurusD · July 18, 2017, 2:11am

Also, I noticed that there showed:

"tags": [
"_dateparsefailure"
]

in _source field.
So did I do something wrong on the filter?

Actually, why I want to fill my data time to @timestamp is that I will query lots of times from database every day. But every time I can just query a fixed amount of data by logstash config. So I hope logstash know which data has been imported by configuring the @timestamp, to avoid duplex importing.

paz · July 18, 2017, 11:21am

Yeah, a _dateparsefailure tag means the filter failed to properly parse the date on the original field, so the @timestamp field got the default value (the time when the log was received).

I tried your posted example (config and sample data) and it works as intended. Are you sure your starttime field is a string? If not, you'll have to convert it to one before the date filter with

    mutate {
        convert => {
            "starttime" => "string"
        }
    }

Also, keep in mind that you may need to set a timezone parameter, because by default it will be converted to UTC, so you will get an offset based on your timezone.

TaurusD · July 19, 2017, 1:36am

Hi paz,
Thank you for your method. Maybe my data format is strange, I didn't succeed through your way, but I found a similar way to get it:

filter {
	mutate{
		convert => { "starttime" => "integer" }
		add_field => { "starttime1" =>  "%{starttime}00" } 
		convert => { "starttime1" => "integer" }
	}
	date{
		match => ["starttime1","yyyyMMddHHmmss"]
		timezone => "Asia/Hong_Kong"
		target => "@timestamp"
	}	
}

It seems that when mapped from integer, @timestamp requests an accuracy to second.
Anyway, you helped me to solve the mapping problem to @timestamp. Thank you!

BTW, I found that though I have set my data time to @timestamp, ELK still won't distinguish duplicated data for a same @timestamp.
Since I have to query a fixed amount of data for one time, let's say 1000 data per time. For the next 1000 data, there may be some duplicate data (maybe 200 ~ 500 but not sure each time) which already imported before, and I can't control that.
So may I ask that is there any method to drop data which already imported? Thank you!

paz · July 19, 2017, 12:03pm

Just to be clear, you want to avoid duplicates when reading from ES with Logstash or when writing to ES?

TaurusD · July 19, 2017, 12:54pm

Actually, either is OK for me. Cuz I just don't want to show duplicate data, no matter it is stored but not showed, or avoid from importing.
But while I'm using KIBANA to visualize data, I didn't find any way to avoid duplicated show. So I'm trying on avoid duplicate writing currently.

Thank you!

paz · July 19, 2017, 1:06pm

In order to avoid duplicate data insertion, the easiest way would be to provide your own id field for the documents instead of letting ES generating a random one upon insert (as I suspect it is the case now). The id field is what ElasticSearch takes into consideration when checking if the document already exists.

So you can use a unique value per document (timestamp? timestamp + some_other_field? It depends on the actual data) and provide it yourself in the Logstash elasticsearch output.

When you have than, you can toy around with the action of the output plugin.

index will discard the already indexed document but will also return an error, potentially flooding your logstash logfile with error codes.
upsert will silently overwrite the old document with the new one (bumping it's version in the process), but since I presume the data are identical, it may not be an issue.

That way you'll be certain each document exists only once, since each id can exist only once.

TaurusD · July 20, 2017, 1:35am

I think I got your point and feel that's the answer I want!
Thank you very much!

system · August 17, 2017, 1:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Split date(yyyymmdd:hhmmss) and convert into default timestamp logstash Logstash	5	2584	March 30, 2017
How to convert yyyyMMddHHmmssSSSZ string into timestamp Logstash	10	974	March 20, 2019
Grok timestamp Logstash	15	2997	February 24, 2017
How to convert String to Timestamp Logstash	2	962	February 18, 2019
Converting time into @timestamp Logstash	9	2074	July 6, 2017

How to split a pure string like YYYYMMDDHHmm and convert into timestamp?

Related topics