Filebeat not shipping logs from specific host


(Rahul Ghanate) #1

Filebeat is not shipping logs from specific hosts, whereas everything is identical compared to another hosts where it is successfully shipping the logs.
I have 3 hosts from where filebeat is shipping logs to logstash server eventually adding to elastic cluster.

Filebeat successfully ships the logs from one server node having exact same prospectors and configurations, but doesn't from another two server nodes.
All 3 server nodes are been configured with puppet configurations, so are having exact same filebeat version, certificate and log files formats.

Here is config I am using to ship the logs,
filebeat.prospectors:
- input_type: log
paths:
- /home/appster/logs/service_logs_*
document_type: serviceLogs
fields:
type: serviceLogs
fields_under_root: true
exclude_files: [".gz$"]
close_inactive: 15m
close_renamed: true
close_removed: true
close_eof: true
multiline.pattern: '^<[0-9]{4}-[0-9]{2}-[0-9]{2}|^[0-9]{2}-[0-9]{2}-[0-9]{4}|^[0-9]{4}-[0-9]{2}-[0-9]{2}|^[A-Z][a-z]{2} [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
multiline.negate: true
multiline.match: after
output.logstash:
hosts: ["logstash.myserver.com:5044"]
bulk_max_size: 4096
worker: 2
pipelining: 10
compression_level: 4
index: logstash
ssl.certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]
ssl.certificate: "/etc/pki/tls/certs/filebeat.crt"
ssl.key: "/etc/pki/tls/private/filebeat.key"

The only difference in working and non-working hosts is number of files,
==> Working server node

ls -l /home/servicelogs/logs/ | wc -l
6138

==> NON Working server node
HOST 1
> ls -ltr /home/servicelogs/logs | wc -l
310238

HOST 2
> ls -l /home/servicelogs/logs/ | wc -l
65834

I had millions of ununsed file in /tmp directory, so I had recently cleaned all those files, but still not getting any logs on logstash server.

Though I am seeing logs are been published in debug logs, but no logs are seen on logstash server,
2017-01-09T15:37:23Z INFO Non-zero metrics in the last 30s: registrar.writes=6 registrar.states.update=121 publish.events=121
2017-01-09T15:37:23Z DBG Flushing spooler because of timeout. Events flushed: 26
2017-01-09T15:37:23Z DBG No events to publish
2017-01-09T15:37:23Z DBG Events sent: 26
2017-01-09T15:37:23Z DBG Processing 26 events
2017-01-09T15:37:23Z DBG Registrar states cleaned up. Before: 5085 , After: 5085
2017-01-09T15:37:23Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-09T15:37:23Z DBG Registry file updated. 5085 states written.
2017-01-09T15:37:28Z DBG Flushing spooler because of timeout. Events flushed: 25
2017-01-09T15:37:28Z DBG No events to publish
2017-01-09T15:37:28Z DBG Events sent: 25
2017-01-09T15:37:28Z DBG Processing 25 events
2017-01-09T15:37:28Z DBG Registrar states cleaned up. Before: 5085 , After: 5085
2017-01-09T15:37:28Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-09T15:37:28Z DBG Registry file updated. 5085 states written.
2017-01-09T15:37:33Z DBG Flushing spooler because of timeout. Events flushed: 26
2017-01-09T15:37:33Z DBG No events to publish
2017-01-09T15:37:33Z DBG Events sent: 26
2017-01-09T15:37:33Z DBG Processing 26 events
2017-01-09T15:37:33Z DBG Registrar states cleaned up. Before: 5085 , After: 5085
2017-01-09T15:37:33Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-09T15:37:33Z DBG Registry file updated. 5085 states written.
2017-01-09T15:37:38Z DBG Flushing spooler because of timeout. Events flushed: 14
2017-01-09T15:37:38Z DBG No events to publish
2017-01-09T15:37:38Z DBG Events sent: 14
2017-01-09T15:37:38Z DBG Processing 14 events
2017-01-09T15:37:38Z DBG Registrar states cleaned up. Before: 5085 , After: 5085
2017-01-09T15:37:38Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-09T15:37:38Z DBG Registry file updated. 5085 states written

OS: Ubuntu 12.04
Versions:
Logstash server 5.0, Filebeat 5.0.2, ElasticSearch 5.0


(ruflin) #2

I assume you hit something like a ulimit on your servers. To limit the number of files that are opened by filebeat you can use https://www.elastic.co/guide/en/beats/filebeat/master/configuration-filebeat-options.html#harvester-limit

For the events sent it could be that these are only states update. Your ulimit could be around 5000 because that is the number states which are stored above :slight_smile:


(Rahul Ghanate) #3

Thanks for the reply @ruflin

I tried setting the harvester_limit to 10000 and now the registry entries have increased.
I missed to mention one thing in last email that I have edited init.d script of filebeat to set the ulimit to 100000.
# cat /proc/13317/limits | grep files
Max open files 100000 100000 files

Another thing I noticed today is that, though the logs show events are sent, but tcpdump doesn't show any packets going out(which is seen on other hosts).
DEBUG logs
2017-01-10T10:01:28Z DBG Flushing spooler because of timeout. Events flushed: 17
2017-01-10T10:01:28Z DBG No events to publish
2017-01-10T10:01:28Z DBG Events sent: 17
2017-01-10T10:01:28Z DBG Processing 17 events
2017-01-10T10:01:28Z DBG Registrar states cleaned up. Before: 23365 , After: 23365
2017-01-10T10:01:28Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-10T10:01:28Z DBG Registry file updated. 23365 states written.
2017-01-10T10:01:33Z DBG Flushing spooler because of timeout. Events flushed: 33
2017-01-10T10:01:33Z DBG No events to publish
2017-01-10T10:01:33Z DBG Events sent: 33
2017-01-10T10:01:33Z DBG Processing 33 events
2017-01-10T10:01:33Z DBG Registrar states cleaned up. Before: 23365 , After: 23365
2017-01-10T10:01:33Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-10T10:01:33Z DBG Registry file updated. 23365 states written.
2017-01-10T10:01:38Z DBG Flushing spooler because of timeout. Events flushed: 19
2017-01-10T10:01:38Z DBG No events to publish
2017-01-10T10:01:38Z DBG Events sent: 19
2017-01-10T10:01:38Z DBG Processing 19 events
2017-01-10T10:01:38Z DBG Registrar states cleaned up. Before: 23365 , After: 23365
2017-01-10T10:01:38Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-10T10:01:38Z DBG Registry file updated. 23365 states written.
2017-01-10T10:01:43Z DBG Flushing spooler because of timeout. Events flushed: 17
2017-01-10T10:01:43Z DBG No events to publish
2017-01-10T10:01:43Z DBG Events sent: 17
2017-01-10T10:01:43Z DBG Processing 17 events
2017-01-10T10:01:43Z DBG Registrar states cleaned up. Before: 23365 , After: 23365
2017-01-10T10:01:43Z DBG Write registry file: /var/lib/filebeat/registry
2017-01-10T10:01:43Z DBG Registry file updated. 23365 states written.

tcpdump
# tcpdump -vv -n dst port 5044
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

Even with harvester_limit to 10000, I am not seeing any logs on logstash(Kibana).
What could possibly be wrong here?


(Rahul Ghanate) #4

To add to that, I found logstash server(10.10.1.122) is sending FIN flag for connection from the error host.

10.10.7.135.58983 > 10.10.1.122.5044: Flags [S], cksum 0xf672 (correct), seq 1913133573, win 29200, options [mss 1460,sackOK,TS val 54646110 ecr 0,nop,wscale 7], length 0
10.10.1.122.5044 > 10.10.7.135.58983: Flags [S.], cksum 0x2dd4 (correct), seq 3085366389, ack 1913133574, win 14480, options [mss 1460,sackOK,TS val 162352132 ecr 54646110,nop,wscale 7], length 0
10.10.7.135.58983 > 10.10.1.122.5044: Flags [.], cksum 0x944b (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 54646110 ecr 162352132], length 0
10.10.7.135.58983 > 10.10.1.122.5044: Flags [P.], cksum 0x161a (correct), seq 1:162, ack 1, win 229, options [nop,nop,TS val 54646110 ecr 162352132], length 161   
10.10.1.122.5044 > 10.10.7.135.58983: Flags [.], cksum 0x9415 (correct), seq 1, ack 162, win 122, options [nop,nop,TS val 162352132 ecr 54646110], length 0
10.10.1.122.5044 > 10.10.7.135.58983: Flags [P.], cksum 0x0e3c (correct), seq 1:1276, ack 162, win 122, options [nop,nop,TS val 162352133 ecr 54646110], length 1275
10.10.7.135.58983 > 10.10.1.122.5044: Flags [.], cksum 0x8e97 (correct), seq 162, ack 1276, win 251, options [nop,nop,TS val 54646111 ecr 162352133], length 0
10.10.7.135.58983 > 10.10.1.122.5044: Flags [P.], cksum 0x507f (correct), seq 162:1343, ack 1276, win 251, options [nop,nop,TS val 54646116 ecr 162352133], length 1181
10.10.1.122.5044 > 10.10.7.135.58983: Flags [P.], cksum 0x99fa (correct), seq 1276:1327, ack 1343, win 145, options [nop,nop,TS val 162352138 ecr 54646116], length 51
10.10.7.135.58983 > 10.10.1.122.5044: Flags [.], cksum 0x89b3 (correct), seq 1343, ack 1327, win 251, options [nop,nop,TS val 54646126 ecr 162352138], length 0
10.10.1.122.5044 > 10.10.7.135.58983: Flags [P.], cksum 0x0b43 (correct), seq 1327:1358, ack 1343, win 145, options [nop,nop,TS val 162352368 ecr 54646126], length 31
10.10.7.135.58983 > 10.10.1.122.5044: Flags [.], cksum 0x87d2 (correct), seq 1343, ack 1358, win 251, options [nop,nop,TS val 54646346 ecr 162352368], length 0
10.10.1.122.5044 > 10.10.7.135.58983: Flags [F.], cksum 0x883b (correct), seq 1358, ack 1343, win 145, options [nop,nop,TS val 162352368 ecr 54646346], length 0   
10.10.7.135.58983 > 10.10.1.122.5044: Flags [.], cksum 0x87c7 (correct), seq 1343, ack 1359, win 251, options [nop,nop,TS val 54646356 ecr 162352368], length 0

I don't know if that is expected?


(Steffen Siering) #5

which logstash-input-beats plugin version have you installed? Ensure you have at least 3.1.11.


(Rahul Ghanate) #6

Thanks.
It looks to be 3.1.8
I will upgrade it and check it.


(Rahul Ghanate) #7

Thanks @ruflin and @steffens for the solution.
The older version of input-beats seems to have the problem.

Upgraded logstash-input-beats(3.1.8 -> 3.1.12) and filebeat(5.0.2 -> 5.1.1) package to the latest and it started working.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.