I am debugging a pipeline that has input from Kafka and output to Elasticsearch. As part of the debugging I'm writing to a file using the file output plugin.
The Kafka topic is a single partition.
This simplified config:
input {
kafka {
bootstrap_servers => "<my broker list>"
client_id => "myclient1"
group_id => "mygroup1"
topics => "mytopic"
auto_offset_reset => "earliest"
consumer_threads=>"1"
max_poll_records => "1"
decorate_events => "basic"
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
mutate {
add_field => {"[mytimestamp]" => "%{+yyyy-MM-dd'T'HH:mm:ss.SSS'Z'}"}
}
mutate {
remove_field => [
"@version",
"@timestamp",
"event"
]
}
}
output {
if [guid2] {
file {
path => "/tmp/myfile.txt"
flush_interval => 0
codec => line { format => "%{[mytimestamp]};%{[@metadata][kafka][timestamp]};%{[@metadata][kafka][offset]};%{[@metadata][kafka][partition]};YES;%{[guid1]};%{[guid2]}" }
}
} else {
file {
path => "/tmp/myfile.txt"
flush_interval => 0
codec => line { format => "%{[mytimestamp]};%{[@metadata][kafka][timestamp]};%{[@metadata][kafka][offset]};%{[@metadata][kafka][partition]};NO;%{[guid1]};%{[guid2]}" }
}
}
}
produces this output in the file:
2024-03-15T00:05:54.908Z;1709672361500;35506;0;NO;dc3cfc45-37ee-4fca-b03b-dcec5dab0440;%{[guid2]}
2024-03-15T00:05:54.933Z;1709852903258;35508;0;NO;d837334d-9801-4452-80eb-2e3c8ca67480;%{[guid2]}
2024-03-15T00:05:55.008Z;1709859117989;35511;0;NO;607d6a0f-9fe3-45a8-98d4-f3892c0b5c1a;%{[guid2]}
2024-03-15T00:05:54.753Z;1707349048850;35403;0;YES;85ab4618-64ba-4dbb-a9d5-c1ec60057c96;85ab4618-64ba-4dbb-a9d5-c1ec60057c96
2024-03-15T00:05:54.837Z;1708017313029;35489;0;YES;a4321154-4755-4158-b7b9-14b4703b83c8;a4321154-4755-4158-b7b9-14b4703b83c8
2024-03-15T00:05:54.932Z;1709672363992;35507;0;YES;dc3cfc45-37ee-4fca-b03b-dcec5dab0440;dc3cfc45-37ee-4fca-b03b-dcec5dab0440
2024-03-15T00:05:54.979Z;1709852905352;35509;0;YES;d837334d-9801-4452-80eb-2e3c8ca67480;d837334d-9801-4452-80eb-2e3c8ca67480
2024-03-15T00:05:55.007Z;1709859119110;35510;0;YES;607d6a0f-9fe3-45a8-98d4-f3892c0b5c1a;607d6a0f-9fe3-45a8-98d4-f3892c0b5c1a
I was expecting to see the lines in file to appear in either event timestamp order, or Kafka timestamp order, or Kafka offset order, but it doesn't appear to be in any sort of order.
So I'm trying to determine if the order of messages is like this on input, or if the order of processing of the messages by Logstash is random, or if the outputting of the lines to the file is not reflecting the order of processing? I don't think the input order is changing, as I have a single partition and a single worker? Is it because I am writing to the same file from 2 file plugins? Is the correct interpretation of processing order that it happened in "mytimestamp" order? Or kafka timestamp order? Or offset order?