How to split a pure string like YYYYMMDDHHmm and convert into timestamp?


(Taurus) #1

Hi, I'm currently using logstash to get some string which in a format of YYYYMMDDHHmm.
That's a kind of pure "time" string which seems hardly to map by grok filter.
I have searched solution for a long time but none of the cases before matched my problem.
So I start a new topic for help. Thank you!


(Paris Mermigkas) #2

You can use the date filter, like e.g.

filter {
    date {
        match => ["old_field","YYYYMMDDHHmm"]
        target => "new_field"
    }
}

(Taurus) #3

Hi paz,
First of all, thank you for answering.

I have tried the date filter before like this:

date{
match => ["starttime","yyyyMMddHHmm"]
target => "@timestamp"
}

And look into JSON files in KIBANA I saw this:

"_source": {
"@timestamp": "2017-07-17T03:56:05.690Z",
"@version": "1",
"starttime": 201707171048,
},
"fields": {
"@timestamp": [
1500263765690
]
}

So the @timestamp in _source seemed not match to my original string on Hour and Minute.
Also, I'm trying to over-write @timestamp in fields but it didn't work.

How should I solve this?

Thank you!


(Taurus) #4

Also, I noticed that there showed:

"tags": [
"_dateparsefailure"
]

in _source field.
So did I do something wrong on the filter?

Actually, why I want to fill my data time to @timestamp is that I will query lots of times from database every day. But every time I can just query a fixed amount of data by logstash config. So I hope logstash know which data has been imported by configuring the @timestamp, to avoid duplex importing.


(Paris Mermigkas) #5

Yeah, a _dateparsefailure tag means the filter failed to properly parse the date on the original field, so the @timestamp field got the default value (the time when the log was received).

I tried your posted example (config and sample data) and it works as intended. Are you sure your starttime field is a string? If not, you'll have to convert it to one before the date filter with

    mutate {
        convert => {
            "starttime" => "string"
        }
    }

Also, keep in mind that you may need to set a timezone parameter, because by default it will be converted to UTC, so you will get an offset based on your timezone.


(Taurus) #6

Hi paz,
Thank you for your method. Maybe my data format is strange, I didn't succeed through your way, but I found a similar way to get it:

filter {
	mutate{
		convert => { "starttime" => "integer" }
		add_field => { "starttime1" =>  "%{starttime}00" } 
		convert => { "starttime1" => "integer" }
	}
	date{
		match => ["starttime1","yyyyMMddHHmmss"]
		timezone => "Asia/Hong_Kong"
		target => "@timestamp"
	}	
}

It seems that when mapped from integer, @timestamp requests an accuracy to second.
Anyway, you helped me to solve the mapping problem to @timestamp. Thank you!

BTW, I found that though I have set my data time to @timestamp, ELK still won't distinguish duplicated data for a same @timestamp.
Since I have to query a fixed amount of data for one time, let's say 1000 data per time. For the next 1000 data, there may be some duplicate data (maybe 200 ~ 500 but not sure each time) which already imported before, and I can't control that.
So may I ask that is there any method to drop data which already imported? Thank you!


(Paris Mermigkas) #7

Just to be clear, you want to avoid duplicates when reading from ES with Logstash or when writing to ES?


(Taurus) #8

Actually, either is OK for me. Cuz I just don't want to show duplicate data, no matter it is stored but not showed, or avoid from importing.
But while I'm using KIBANA to visualize data, I didn't find any way to avoid duplicated show. So I'm trying on avoid duplicate writing currently.

Thank you!


(Paris Mermigkas) #9

In order to avoid duplicate data insertion, the easiest way would be to provide your own id field for the documents instead of letting ES generating a random one upon insert (as I suspect it is the case now). The id field is what ElasticSearch takes into consideration when checking if the document already exists.

So you can use a unique value per document (timestamp? timestamp + some_other_field? It depends on the actual data) and provide it yourself in the Logstash elasticsearch output.

When you have than, you can toy around with the action of the output plugin.

  • index will discard the already indexed document but will also return an error, potentially flooding your logstash logfile with error codes.
  • upsert will silently overwrite the old document with the new one (bumping it's version in the process), but since I presume the data are identical, it may not be an issue.

That way you'll be certain each document exists only once, since each id can exist only once.


(Taurus) #10

I think I got your point and feel that's the answer I want!
Thank you very much!


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.