Documents order in Elasticsearch after processing with filebeat/logstash


#1

Hi,

I have the following setup to gather data : filebeat -> logstash -> elastichsearch

As input I have files with the following plain text format :

2017-07-25 10:36:07,988 21 User 1 Ligne 0
2017-07-25 10:36:07,988 21 User 1 Ligne 1
2017-07-25 10:36:07,988 21 none 1 Ligne 2

once the file is harvested by filebeat it goes thru logstash where I map @timestamp field. but when I query Elasticsearch I get different order than the one the source log file. (log2, log 0, log 1 instead of log 0, log 1, log 2 )

the timestamp field in not precise enough I will always get more than 1 doc for the same exact timestamp.

How can I solve this issue ? I would like to be able to get document in elasticsearch in the same order as their are in the log file.

Regards


(Magnus Bäck) #2

Perhaps you can use the file offset as a secondary sort key? I believe Filebeat records it in the offset field.


#3

You are right, if I use both @timestamp and the offset. the result seems to be good. but I have one more question. Do you know how can I merge @timestamp and offset into a new field and use it as unique sorting criteria (I suppose its better when it comes to reindexing and performance)

update :
I tried this

add_field => { "timestamp_sort" => "%{@timestamp}+%{offset}" }

But as result I got this : :confused:

"timestamp_sort": "2017-08-06T08:36:07.988Z+53"


(Magnus Bäck) #4

You'd have to use a ruby filter to do that, but I don't think just adding the timestamp expressed in milliseconds (if that's what you meant) with the file offset is a good idea since entries right after a rotation could very well sort prior to entries just before the rotation. You should figure out something else that doesn't have that problem.


#5

Hi Magnus,

Thank you for your feedback. So far, I was able to figure out the following :

  • declare timestamp_sort as date in the mapping of the index in elasticsearch

"timestamp_sort" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss,SSSSSSSSSSSSSSSS"
}

  • in logstash simply concatenate timestamp and the offset (not adding but concatenate )

add_field => { “timestamp_sort” => “%{@timestamp}%{offset}” }

when pulling data from ELK, the sorting will be based on the new field (asc) + file name (desc).

This seems to be working, at least I didn't see any issue so far :smiley:

Regards


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.