I/o timeout on clientside and overloading logstash server


(mbk) #1

Hello everyone,

i previously installed logstash on a Ubuntu 16.04 LTS VM on a Hyper-V. We wanted to collect all Windows Events with winlogbeat.
I tested it some days with only a few server. It works, but after sometime the server begins slow down until it reaches RED state. Only a reboot is helping. I did not change the elasticsearch.yml config.. i just set the configs to send logs over logstash with certs.

Winlogbeat log:

On Friday our Windowsserver suddenly stopped sending logs.
2017-08-14T10:02:28+02:00 INFO EventLog[Application] Successfully published 1 events
2017-08-14T10:02:33+02:00 INFO Stopping Winlogbeat
2017-08-14T10:02:33+02:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=7 libbeat.logstash.publish.read_bytes=371 libbeat.logstash.publish.write_bytes=7395 libbeat.logstash.published_and_acked_events=20 libbeat.publisher.published_events=20 msg_file_cache.ApplicationMisses=4 msg_file_cache.ApplicationSize=4 msg_file_cache.SecurityHits=8 msg_file_cache.SystemHits=5 msg_file_cache.SystemMisses=3 msg_file_cache.SystemSize=3 published_events.Application=4 published_events.Security=8 published_events.System=8 published_events.total=20 uptime={"server_time":"2017-08-14T08:02:33.0551393Z","start_time":"2017-08-10T13:39:26.401834Z","uptime":"90h23m6.6533053s","uptime_ms":"325386653305"}
2017-08-14T10:02:33+02:00 INFO EventLog[System] Stop processing.
2017-08-14T10:02:33+02:00 INFO EventLog[Security] Stop processing.
2017-08-14T10:02:33+02:00 INFO EventLog[Application] Stop processing.
2017-08-14T10:02:33+02:00 INFO Total non-zero values:  libbeat.logstash.call_count.PublishEvents=17769 libbeat.logstash.publish.read_bytes=1936401 libbeat.logstash.publish.read_errors=111 libbeat.logstash.publish.write_bytes=37270974 libbeat.logstash.publish.write_errors=81 libbeat.logstash.published_and_acked_events=133596 libbeat.logstash.published_but_not_acked_events=240 libbeat.publisher.published_events=133596 msg_file_cache.ApplicationHits=420 msg_file_cache.ApplicationMisses=65 msg_file_cache.ApplicationSize=4 msg_file_cache.SecurityHits=74052 msg_file_cache.SecurityMisses=13 msg_file_cache.SecuritySize=1 msg_file_cache.SystemHits=57794 msg_file_cache.SystemMisses=1261 msg_file_cache.SystemSize=4 published_events.Application=482 published_events.Security=74059 published_events.System=59055 published_events.total=133596 uptime={"server_time":"2017-08-14T08:02:33.9375624Z","start_time":"2017-08-10T13:39:26.401834Z","uptime":"90h23m7.5357284s","uptime_ms":"325387535728"}
2017-08-14T10:02:33+02:00 INFO Uptime: 90h23m7.545728s
2017-08-14T10:02:33+02:00 INFO winlogbeat stopped.

server:

[2017-08-14 08:14:32,672][INFO ][monitor.jvm              ] [Zaladane] [gc][young][247803][36436] duration [752ms], collections [1]/[1s], total [752ms]/[1.6h], memory [580.7mb]->[588.5mb]/[1015.6mb], all_pools {[young] [1.9mb]->[17.7mb]$
[2017-08-14 08:32:39,722][INFO ][cluster.metadata         ] [Zaladane] [winlogbeat-2017.08.14] update_mapping [wineventlog]
[2017-08-14 10:02:27,920][INFO ][cluster.metadata         ] [Zaladane] [winlogbeat-2017.08.14] update_mapping [wineventlog]
[2017-08-14 10:05:03,233][INFO ][cluster.metadata         ] [Zaladane] [winlogbeat-2017.08.14] update_mapping [wineventlog]
[2017-08-14 10:05:03,458][INFO ][cluster.metadata         ] [Zaladane] [winlogbeat-2017.08.14] update_mapping [wineventlog]
[2017-08-14 10:05:03,497][INFO ][cluster.metadata         ] [Zaladane] [winlogbeat-2017.08.14] update_mapping [wineventlog]

later on server after 5 hours it start spamming these:

[2017-08-14 15:45:58,997][DEBUG][action.bulk              ] [Zaladane] [winlogbeat-2017.08.14][3] failed to execute bulk item (index) index {[winlogbeat-2017.08.14][wineventlog][AV3g_nVqLna5gr6L84O_], source[{"message":"Die Systemzeit w$
MapperParsingException[failed to parse [event_data.PreviousTime]]; nested: IllegalArgumentException[Invalid format: "11:43:51" is malformed at ":43:51"];
        at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:329)
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:311)
        at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:438)
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:264)
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:308)
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:328)
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:254)
        at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:124)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
        at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:533)
        at org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:510)
        at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:214)
        at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:223)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:327)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:120)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:657)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:287)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
        .... due to char limit i cut here

(mbk) #2

after some restarts of Winlogbeat and the Elasticsearch Server i receive following error on the Clients:

2017-08-16T12:06:38+02:00 INFO Home path: [C:\Program Files\Winlogbeat] Config path: [C:\Program Files\Winlogbeat] Data path: [C:\\ProgramData\\winlogbeat] Logs path: [C:\Program Files\Winlogbeat\logs]
2017-08-16T12:06:38+02:00 INFO Setup Beat: winlogbeat; Version: 5.5.1
2017-08-16T12:06:38+02:00 INFO Max Retries set to: 3
2017-08-16T12:06:38+02:00 INFO Activated logstash as output plugin.
2017-08-16T12:06:38+02:00 INFO Publisher name: Server04
2017-08-16T12:06:38+02:00 INFO Flush Interval set to: 1s
2017-08-16T12:06:38+02:00 INFO Max Bulk Size set to: 2048
2017-08-16T12:06:38+02:00 INFO State will be read from and persisted to C:\ProgramData\winlogbeat\.winlogbeat.yml
2017-08-16T12:06:38+02:00 INFO winlogbeat start running.
2017-08-16T12:07:08+02:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_bytes=1390 libbeat.logstash.publish.write_bytes=1732 libbeat.publisher.published_events=300 msg_file_cache.ApplicationHits=92 msg_file_cache.ApplicationMisses=9 msg_file_cache.ApplicationSize=9 msg_file_cache.SecurityHits=99 msg_file_cache.SecurityMisses=1 msg_file_cache.SecuritySize=1 msg_file_cache.SystemHits=99 msg_file_cache.SystemMisses=1 msg_file_cache.SystemSize=1 uptime={"server_time":"2017-08-16T10:07:08.4325404Z","start_time":"2017-08-16T10:06:38.3176258Z","uptime":"30.1149146s","uptime_ms":"30114914"}
2017-08-16T12:07:29+02:00 ERR Failed to publish events caused by: read tcp 192.168.100.148:62165->192.168.100.135:5044: i/o timeout
2017-08-16T12:07:29+02:00 INFO Error publishing events (retrying): read tcp 192.168.100.148:62165->192.168.100.135:5044: i/o timeout
2017-08-16T12:07:38+02:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_bytes=1390 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=1295 libbeat.logstash.publish.write_errors=1 libbeat.logstash.published_but_not_acked_events=100 uptime={"server_time":"2017-08-16T10:07:38.4319865Z","start_time":"2017-08-16T10:06:38.3176258Z","uptime":"1m0.1143607s","uptime_ms":"60114360"}
2017-08-16T12:07:57+02:00 INFO Stopping Winlogbeat
2017-08-16T12:07:57+02:00 INFO EventLog[Application] Stop processing.
2017-08-16T12:07:57+02:00 INFO EventLog[Security] Stop processing.
2017-08-16T12:07:57+02:00 INFO EventLog[System] Stop processing.
2017-08-16T12:07:57+02:00 INFO Total non-zero values:  libbeat.logstash.call_count.PublishEvents=2 libbeat.logstash.publish.read_bytes=2780 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_bytes=3027 libbeat.logstash.publish.write_errors=1 libbeat.logstash.published_but_not_acked_events=100 libbeat.publisher.published_events=300 msg_file_cache.ApplicationHits=92 msg_file_cache.ApplicationMisses=9 msg_file_cache.ApplicationSize=9 msg_file_cache.SecurityHits=99 msg_file_cache.SecurityMisses=1 msg_file_cache.SecuritySize=1 msg_file_cache.SystemHits=99 msg_file_cache.SystemMisses=1 msg_file_cache.SystemSize=1 uptime={"server_time":"2017-08-16T10:07:57.5339905Z","start_time":"2017-08-16T10:06:38.3176258Z","uptime":"1m19.2163647s","uptime_ms":"79216364"}
2017-08-16T12:07:57+02:00 INFO Uptime: 1m19.2263566s
2017-08-16T12:07:57+02:00 INFO winlogbeat stopped.

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.