Logstash/Filebeat lag issue - Logs delayed by hours

Rios · March 4, 2026, 6:19pm

After @Cesar_Mejia investigation, I agree with StephenB Jedi to move LS heavy loads to another host, at least temporarily.
Again, check grok processing times, maybe that can be optimized.

RainTown · March 4, 2026, 7:39pm

Agreed.

But the gaps in the monitoring graphs suggests to me that indexing is completely stalling under load, and

likely has not enough IOPS for peak ingesting load, not enough that would be required to keep up. This creates back pressure .... and it takes a while to recover.

What to do about it? Get/use faster storage.

Edit: If using Linux, you can check IO perf using iostat -x <N>, say N=1 or 10, while under load. I suspect you will see high %util and await times on the device corresponding to the RAID volume.

One way would be to let Dell sell you some at-least-slightly-greater capacity SSDs that just slide into the server, let the RAID (unless it is RAD0) rebuild, and rinse and repeat til all disks are replaced. 101 other ways too. Also check where the significant IOPS are being performed, its relatively easy to have something writing to OS volume (which might also be slow HDDs) by mistake, especially when same server is hosting a few services. And de-coupling logstash from elasticsearch is just a better design IMO. I'm old school, but 3x ES instances (which are competing with each other!?) AND logstash on same physical server does not rock my boat either.

Rios · March 4, 2026, 9:05pm

Speak of... has anyone tested ELK performances on NVMe disks? Is it significant better and reliable on the longer period?

stephenb · March 5, 2026, 2:13am

@rios is NVMe Significantly better than what?

Elastic cloud runs tens of thousands of clusters on with local NVME and the IO Throughput is significantly better than remote SSD or spinning disc or most other storages.

Of course it all comes down to specific use cases, but yes, in general nvme is better performance than most options...
Better performance but it may not be better cost-to-performance for your specific use case if the high IO is not required.

Rios · March 5, 2026, 7:27am

Yes, as you compare SSD or older disk technologies. NVMe is a relatively new and yes they are very fast.
For a few years, when SSDs were first introduced, they were more for laptops/PC rather than servers. Nowadays SSD are so reliable, nobody has ask/think about SSDs reliability to work 7/24/365.

Thank you for sharing valuable information.

RainTown · March 5, 2026, 1:02pm

all solid state storage will have a lifetime, as underneath there are just NAND cells which can be erased/written only some finite number of times. The NVMe protocol, typically over PCIe, is just faster, higher throughput, lower latency, higher parallelism, than traditional SATA/SAS/IDE/SCSI interfaces/protocols, mostly because they were designed with spinning disks in mind, and NVMe was designed specifically for solid state devices.

But pulling it back to @Cesar_Mejia 's issue, maybe he's had a chance to capture some iostat diagnostics? If changing the storage is not an option, and if I am right that the slow HDD storage is his main problem, then what else could he do?

@Cesar_Mejia You mentioned 3 elasticsearch instances? What was motivation there? These instances are via virtual machines or containers, or just with different settings in elasticsearch.yml so that they co-exist on same machine? Are the data directories all on same RAID volume? My fear is you have essentially relatively slow storage, then multiple things competing for the same IOPS, making a non-ideal situation worse.

Topic		Replies	Views
Slow performance Logstash/Filebeat Beats filebeat	6	1635	September 16, 2022
FIlebeat-Redis-Logstash : Filebeat fast and Logstah slow, logstash threading? Logstash	18	3873	January 13, 2017
Logstash not ingesting fast moving log Logstash	10	7745	June 24, 2024
Filebeat sending data to Logstash seems too slow Beats filebeat	19	22613	June 1, 2017
Filebeat cannot keep up Beats filebeat	6	1140	June 15, 2018

Logstash/Filebeat lag issue - Logs delayed by hours

Related topics