Logstash 'holds' log entries until a later entry or shutdown

Hi,

I'm having a bit of a problem with logstash.

I have a log file that is very uniform. The log file is updated rather infrequently - maybe only ten times a day. However, there are times when the log file is updated multiple times in one second.

I am running into an issue where logstash will correctly read the multiple-times-per-second lines, but only output the FIRST line... though the subsequent lines must be stored in a buffer somewhere or something, because THEY will flood out as soon as either logstash is shut down or a later entry comes into the log... at which point all the 'buffered' lines come spilling out of logstash with the right timestamps.

My logstash config file:

input {
 file { path => "/var/log/inventory/faillog" }
 file { path => "/var/log/inventory/transactions" }
}

filter {
 if "transactions" in [path] {
  grok {
   match => { "message" => "%{DATE} %{TIME} - \[(?<flag>(.*?))\] (?<hrtext>(.*?)) QUERY: \[(?<query>(.*?))\] SUBQUERY: \[name=(?<name>.*) item=(?<item>.*) change=(?<change>.*)\]" }
   remove_field => [ "%{DATE_US} %{DATE_EU}" ]
  }
 }

 if "faillog" in [path] {
  grok {
   match => { "message" => "%{DATE} %{TIME} - \[(?<flag>(.*?))\] (?<hrtext>(.*?)) QUERY: \[(?<query>(.*?))\]" }
   remove_field => [ "%{DATE_US} %{DATE_EU}" ]
  }
 }
}

output {
 if "faillog" in [path] {
  file { path => "/opt/logstash/pipelines/erroroutput" }
 }
# elasticsearch { index => "errors" }

 if "transactions" in [path] {
  file { path => "/opt/logstash/pipelines/transoutput" }
 }
# elasticsearch { index => "transactions" }
}

An example of the transaction log, where item 'test1' was created at 16:27:11, and items "test2" and "test3" were created within the same second, a full ten seconds later.

2016-05-31 16:27:11.387079 - [NEW_ITEM] 1 test added. QUERY: [/opt/inventory/transaction.py name=danp item=test change=1] SUBQUERY: [name=danp item=test change=1]
2016-05-31 16:27:21.568044 - [NEW_ITEM] 1 test2 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test2 change=1]
2016-05-31 16:27:21.609719 - [NEW_ITEM] 1 test3 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test3 change=1]

And here, the /opt/logstash/pipelines/transoutput file (logstash's output for this particular log). Notice how it's missing test3.

{"message":"2016-05-31 16:27:11.387079 - [NEW_ITEM] 1 test added. QUERY: [/opt/inventory/transaction.py name=danp item=test change=1] SUBQUERY: [name=danp item=test change=1]","@version":"1","@timestamp":"2016-05-31T23:27:11.909Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test added.","query":"/opt/inventory/transaction.py name=danp item=test change=1","name":"danp","item":"test","change":"1"}
{"message":"2016-05-31 16:27:21.568044 - [NEW_ITEM] 1 test2 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test2 change=1]","@version":"1","@timestamp":"2016-05-31T23:27:21.920Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test2 added.","query":"/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1","name":"danp","item":"test2","change":"1"}

To better illustrate the problem, I will now add test4, test5, and test6 all in the same second. Behold, logstash's input (next post, to beat the 5000 word limit):

2016-05-31 16:27:11.387079 - [NEW_ITEM] 1 test added. QUERY: [/opt/inventory/transaction.py name=danp item=test change=1] SUBQUERY: [name=danp item=test change=1]
2016-05-31 16:27:21.568044 - [NEW_ITEM] 1 test2 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test2 change=1]
2016-05-31 16:27:21.609719 - [NEW_ITEM] 1 test3 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test3 change=1]
2016-05-31 16:30:23.892219 - [NEW_ITEM] 1 test4 added. QUERY: [/opt/inventory/transaction.py name=danp item=test4 change=1 item=test5 change=1 item=test6 change=1] SUBQUERY: [name=danp item=test4 change=1]
2016-05-31 16:30:23.951914 - [NEW_ITEM] 1 test5 added. QUERY: [/opt/inventory/transaction.py name=danp item=test4 change=1 item=test5 change=1 item=test6 change=1] SUBQUERY: [name=danp item=test5 change=1]
2016-05-31 16:30:23.996289 - [NEW_ITEM] 1 test6 added. QUERY: [/opt/inventory/transaction.py name=danp item=test4 change=1 item=test5 change=1 item=test6 change=1] SUBQUERY: [name=danp item=test6 change=1]

And now, notice how in the output test3 has APPEARED behind test4, but test5 and test6 are missing. Rest assured, when I shut down logstash, test5 and test6 appear.

{"message":"2016-05-31 16:27:11.387079 - [NEW_ITEM] 1 test added. QUERY: [/opt/inventory/transaction.py name=danp item=test change=1] SUBQUERY: [name=danp item=test change=1]","@version":"1","@timestamp":"2016-05-31T23:27:11.909Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test added.","query":"/opt/inventory/transaction.py name=danp item=test change=1","name":"danp","item":"test","change":"1"}
{"message":"2016-05-31 16:27:21.568044 - [NEW_ITEM] 1 test2 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test2 change=1]","@version":"1","@timestamp":"2016-05-31T23:27:21.920Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test2 added.","query":"/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1","name":"danp","item":"test2","change":"1"}
{"message":"2016-05-31 16:27:21.609719 - [NEW_ITEM] 1 test3 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test3 change=1]","@version":"1","@timestamp":"2016-05-31T23:27:21.920Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test3 added.","query":"/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1","name":"danp","item":"test3","change":"1"}
{"message":"2016-05-31 16:30:23.892219 - [NEW_ITEM] 1 test4 added. QUERY: [/opt/inventory/transaction.py name=danp item=test4 change=1 item=test5 change=1 item=test6 change=1] SUBQUERY: [name=danp item=test4 change=1]","@version":"1","@timestamp":"2016-05-31T23:30:24.065Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test4 added.","query":"/opt/inventory/transaction.py name=danp item=test4 change=1 item=test5 change=1 item=test6 change=1","name":"danp","item":"test4","change":"1"}

What's going on here?