Registry writes slowing down filebeat

Hello,

During a performance test, we put ~30000 files of 2.8mb each in the folder from where filebeat reads.

Over a period of time, the indexing rate slowed down. All best practices followed, like index refresh set to 30s, auto-generated document id used, tuned workers, harvester_buffer_size & bulk_max_size.

Checked a few things to confirm that elastic nodes were not a bottleneck. Set replicas to 0 but no improvement which confirmed the same.

On further investigation, found that log level was set to info. Changed it to error and that helped somewhat because it was writing some info and one warning very frequently. But still the performance was not so good at~2-3k/s.

On further investigation, we found that it was writing thousands of lines to the registry every now and then. And whenever it would write, that is when the indexing rate would drop. So changed registry.flush to 60s and close_eof to true. That helped further. But still the performance was not so good at 4-5k/s.

Q1. Can someone help understand why was it writing thousands of lines to the registry? Even after changing the flush interval, it would not write for 60s, but then it would write for 20-30 seconds and the writes amounted to 50-80mb. I realized that it was something to do with the fact that we had placed ~30k files in the input folder, but we weren't sure which ones could be deleted to help bring down the registry.

Q2. Is there a way to know which files are uploaded, so that those could be moved/deleted?

Q3. Even after all files were uploaded, and we cleaned those up, the registry size did not reduce. We deleted the registry folder (like a hack). When will the registry size be reduced? But still the performance is poor. Any suggestions what else can we look for?

Q4. Instead of thousands of small files, believe less number of bigger files will help have a smaller registry, but will it help improve performance?

Thanks

Which version of filebeat did you use, also did you use the filestream input or the deprecated log input?

Please share the filebeat.yml you used.

Thanks

The filebeat version is 8.9.2. We are using the log input. I saw that input is deprecated but wasn't motivated to change because came across some bugs which affected the filestream but not log input type, especially related t registry/harvester/close_/clean_ or something (dont recollect exactly, but I can dig out).

filebeat.yml

Yeah, I'm not sure that you can improve this without changing to the filestream input.

The filestream inputs has a couple of improvements related to how the registry file is written, mainly:

Only the most recent updates are serialized to the registry. In contrast, the log input has to serialize the complete registry on each ACK from the outputs. This makes the registry updates much quicker with this input.

And

The input ensures that only offsets updates are written to the registry append only log. The log writes the complete file state.

Not sure what bugs you are referring to, but I would recommend you to test this using the filestream input and a newer version.

thanks @leandrojmp

I looked through filestream and all enhancements seem to be exactly for the problems that we observed :slightly_smiling_face:

btw, the bugs were related to configuration parameters (close_, clean_, harvester_limit, etc.) that are relevant when there are a huge number of files like was the case in our env. I could recall a few:

Since it was mentioned on some of those that the bugs did not apply for "log", we tried avoiding the study of filestream. But now that you suggest it, we are looking at it. Probably the bugs may also have been fixed in the latest version.

We are checking but it would help if you can just confirm that we simply replace log with filestream leaving the cfg (eg harvestor_buffer_size, close_eof) as is or more changes will be required.

Thanks again!