Logstash is not reading some files from filebeat

There is data in a folder which is read by filebeat and forwarded to Logstash . So we write to logs file everyday and based on date multiple files(eg. : 2022-03-29-1.log , 2022-03-30-1.log, etc) get created. But somehow some of these files are not getting ingested in Elasticsearch and there is no error in filebeat logs except below:
2022-04-06T18:38:38.549Z ERROR pipeline/output.go:121 Failed to publish events: write tcp 10.13.111.100:52601->10.13.122.222:5044: wsasend: An existing connection was forcibly closed by the remote host.
2022-04-06T18:38:38.549Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://10.13.122.222:5044))
2022-04-06T18:38:38.549Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://10.13.122.222:5044)) established
2022-04-06T18:39:06.293Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":137156,"time":{"ms":16}},"total":{"ticks":199609,"time":{"ms":16},"value":199609},"user":{"ticks":62453}},"handles":{"open":243},"info":{"ephemeral_id":"75431a97-8114-439d-93f3-2495cb595788","uptime":{"ms":438210157}},"memstats":{"gc_n

What could be the possible cause of data loss. Other log's file data is getting ingested in Elasticsearch in the same folder but there are other files whose's data is not getting ingested.

Hi,
If the whole file isn't ingested, you should check its ownership, if filebeat has a dedicated user it can't read root's files for instance (and no error as filebeat can't even see it).
Hope it helps

Hello,

Thanks for response. Others file were getting ingested in the same folder and next day also few logs files got generated and that to got ingested into Elasticsearch. I have checked the Logstash's logs and it was showing warning "reason"=>"Validation Failed: 1: this action would add [4] total shards, but this cluster currently has [2000]/[2000] maximum shards open;"

So I just need confirmation that if no availability of Shards can cause the data loss or am I not looking at right place for data loss.

I think that is logstash logging an error it received from an Elasticsearch cluster. That could well lead to data loss. The normal recommendation for limiting shard size is around 50 GB. 2000 of those would be 100 terabytes. The volunteer community cannot replicate or support that.

If your data is less than that and you do not understand why your Elasticsearch cluster has 2,000 shards open then I would start in the Elasticsearch forum. It could relate to a template installed by logstash, but you will need details from the ES community to get help from the LS folks.

Hii,
Thank You for responding. Total disk available is around 550 GB ..and now its showing in Kibana primary shards as 935 and same number of Replica shards. 100 terabyte is way too much than the actual data we have. How can I find out size of each shard?..is there any API for that.

Also now data is getting ingested so is it possible that when this error occurred at that time there might be no shards available for creating index. As now some index might have get deleted as ILM policy for some indices is 200 days and shards are available for new index creation.

I suggest you read this. That shows which API to use to display shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.