Documents order in Elasticsearch after processing with filebeat/logstash

Hi,

I have the following setup to gather data : filebeat -> logstash -> elastichsearch

As input I have files with the following plain text format :

2017-07-25 10:36:07,988 21 User 1 Ligne 0
2017-07-25 10:36:07,988 21 User 1 Ligne 1
2017-07-25 10:36:07,988 21 none 1 Ligne 2

once the file is harvested by filebeat it goes thru logstash where I map @timestamp field. but when I query Elasticsearch I get different order than the one the source log file. (log2, log 0, log 1 instead of log 0, log 1, log 2 )

the timestamp field in not precise enough I will always get more than 1 doc for the same exact timestamp.

How can I solve this issue ? I would like to be able to get document in elasticsearch in the same order as their are in the log file.

Regards

Perhaps you can use the file offset as a secondary sort key? I believe Filebeat records it in the offset field.

You are right, if I use both @timestamp and the offset. the result seems to be good. but I have one more question. Do you know how can I merge @timestamp and offset into a new field and use it as unique sorting criteria (I suppose its better when it comes to reindexing and performance)

update :
I tried this

add_field => { "timestamp_sort" => "%{@timestamp}+%{offset}" }

But as result I got this : :confused:

"timestamp_sort": "2017-08-06T08:36:07.988Z+53"

You'd have to use a ruby filter to do that, but I don't think just adding the timestamp expressed in milliseconds (if that's what you meant) with the file offset is a good idea since entries right after a rotation could very well sort prior to entries just before the rotation. You should figure out something else that doesn't have that problem.

Hi Magnus,

Thank you for your feedback. So far, I was able to figure out the following :

  • declare timestamp_sort as date in the mapping of the index in elasticsearch

"timestamp_sort" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss,SSSSSSSSSSSSSSSS"
}

  • in logstash simply concatenate timestamp and the offset (not adding but concatenate )

add_field => { “timestamp_sort” => “%{@timestamp}%{offset}” }

when pulling data from ELK, the sorting will be based on the new field (asc) + file name (desc).

This seems to be working, at least I didn't see any issue so far :smiley:

Regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.