Hi,
I'm having a bit of a problem with logstash.
I have a log file that is very uniform. The log file is updated rather infrequently - maybe only ten times a day. However, there are times when the log file is updated multiple times in one second.
I am running into an issue where logstash will correctly read the multiple-times-per-second lines, but only output the FIRST line... though the subsequent lines must be stored in a buffer somewhere or something, because THEY will flood out as soon as either logstash is shut down or a later entry comes into the log... at which point all the 'buffered' lines come spilling out of logstash with the right timestamps.
My logstash config file:
input {
file { path => "/var/log/inventory/faillog" }
file { path => "/var/log/inventory/transactions" }
}
filter {
if "transactions" in [path] {
grok {
match => { "message" => "%{DATE} %{TIME} - \[(?<flag>(.*?))\] (?<hrtext>(.*?)) QUERY: \[(?<query>(.*?))\] SUBQUERY: \[name=(?<name>.*) item=(?<item>.*) change=(?<change>.*)\]" }
remove_field => [ "%{DATE_US} %{DATE_EU}" ]
}
}
if "faillog" in [path] {
grok {
match => { "message" => "%{DATE} %{TIME} - \[(?<flag>(.*?))\] (?<hrtext>(.*?)) QUERY: \[(?<query>(.*?))\]" }
remove_field => [ "%{DATE_US} %{DATE_EU}" ]
}
}
}
output {
if "faillog" in [path] {
file { path => "/opt/logstash/pipelines/erroroutput" }
}
# elasticsearch { index => "errors" }
if "transactions" in [path] {
file { path => "/opt/logstash/pipelines/transoutput" }
}
# elasticsearch { index => "transactions" }
}
An example of the transaction log, where item 'test1' was created at 16:27:11, and items "test2" and "test3" were created within the same second, a full ten seconds later.
2016-05-31 16:27:11.387079 - [NEW_ITEM] 1 test added. QUERY: [/opt/inventory/transaction.py name=danp item=test change=1] SUBQUERY: [name=danp item=test change=1]
2016-05-31 16:27:21.568044 - [NEW_ITEM] 1 test2 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test2 change=1]
2016-05-31 16:27:21.609719 - [NEW_ITEM] 1 test3 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test3 change=1]
And here, the /opt/logstash/pipelines/transoutput file (logstash's output for this particular log). Notice how it's missing test3.
{"message":"2016-05-31 16:27:11.387079 - [NEW_ITEM] 1 test added. QUERY: [/opt/inventory/transaction.py name=danp item=test change=1] SUBQUERY: [name=danp item=test change=1]","@version":"1","@timestamp":"2016-05-31T23:27:11.909Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test added.","query":"/opt/inventory/transaction.py name=danp item=test change=1","name":"danp","item":"test","change":"1"}
{"message":"2016-05-31 16:27:21.568044 - [NEW_ITEM] 1 test2 added. QUERY: [/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1] SUBQUERY: [name=danp item=test2 change=1]","@version":"1","@timestamp":"2016-05-31T23:27:21.920Z","path":"/var/log/inventory/transactions","host":"redacted","flag":"NEW_ITEM","hrtext":"1 test2 added.","query":"/opt/inventory/transaction.py name=danp item=test2 change=1 item=test3 change=1","name":"danp","item":"test2","change":"1"}
To better illustrate the problem, I will now add test4, test5, and test6 all in the same second. Behold, logstash's input (next post, to beat the 5000 word limit):