Hi Guyboertje,
Please find my comment below:
1.is LS on its own machine?
- No, three services are running on single server and those are data source(logs generating app) ,filebeat and logstash.
2.how, exactly, are you counting the documents in ES to know that there are some missing?
- Actually there is no miss in data but there is delay. When I count the number of lines in my log file for an hour(say 15th hour of the day) I will find 50,00,000 lines/events but I find only 30,00,000+ events/lines in discover UI(15:00th bar showing 10L+ and 15:30th bar showing 20L+). When I look back discover UI next day for yesterday's data, I will find all 50L events. This is the case for all peak hours. Data is not available in near real time.
3.whether you tried a filebeat input with a file output (no filters, no output conditionals) as the simplest config to verify that the line count in the source log files is the same as the line count in the destination file?
- I don't feel there is any problem with filebeat as I don't find any data miss, but still I will try this for couple of peak hour log files and get back.
4.does your config use the drop filter?
- No, we don't use drop in our config.
5.does your config have if conditional blocks that restricts which events are sent to ES?
- Yes, we have if block in our config. I will share our LS config file here with sensitive information masked.
LS config
input{
beats {
port => 5044
type => "SomeType"
}
}
filter{
mutate{
gsub => ["message", "\|", ";"]
}
grok {
patterns_dir => "/logstash-5.6.4/pattern"
match => [ "message", "%{PROXY_LOG_PATTERN_1}", "message", "%{PROXY_LOG_PATTERN}", "message", "%{ENTRY_PATTERN}", "message
", "%{EXIT_SUCCESS_PATTERN}","message", "%{ENTRY_PATTERN_1}", "message", "%{ERROR_PATTERN}", "message", "%{ERROR_PATTERN_1}", "message", "%{ENTRY_PATTERN_2}"]
}
if [Endpoint] =~ /.+/
{
mutate {
add_field => {
"NewFieldName" => "%{Endpoint}.%{operation}"
}
}
}
if [client_sent_end_timestamp] =~ /.+/ and [client_received_end_timestamp] =~ /.+/ and [target_sent_end_timestamp] =~ /.+/ and [target_received_end_timestamp] =
~ /.+/ {
ruby{
code => **Some calculation**
add_tag => ["**some tag**"]
}
}
if [client_sent_end_Date_Time] =~ /.+/ and [client_received_end_Date_Time] =~ /.+/ and [target_sent_end_Date_Time] =~ /.+/ and [target_received_end_Date_Time] =~ /.+/ {
ruby{
code => **Some calculation**
add_tag => ["**some tag**"]
}
}
if [**Some field**] =~ /.+/
{
ruby
{
code => "**Some calculation**"
}
}
date{
match => [ "Time_Date", "ISO8601" ]
target => "@timestamp"
}
}
output{
if "_grokparsefailure" in [tags]
{
elasticsearch
{
hosts => ["ES1:9200","ES2:9200"]
index => "grokparsefailure-%{+YYYY.MM.dd}"
}
}
else {
elasticsearch
{
hosts => ["ES1:9200","ES2:9200","ES3:9200"]
}
}
}
FB config
#=========================== Filebeat prospectors =============================
#------------------------------ Log prospector --------------------------------
- input_type: log
paths:
- /var/log/app/services/*/*/syslog.log*
ignore_older: 10m
close_inactive: 5m
#========================= Filebeat global options ============================
filebeat.spool_size: 4096
filebeat.idle_timeout: 2s
#================================ Outputs ======================================
#----------------------------- Logstash output ---------------------------------
output.logstash:
# Boolean flag to enable or disable the output module.
#enabled: true
# The Logstash hosts
hosts: ["localhost:5044"]
Rest are set to default.
Please let me know if any further details required.
Thanks,
Shreepad