During a performance test, we put ~30000 files of 2.8mb each in the folder from where filebeat reads.
Over a period of time, the indexing rate slowed down. All best practices followed, like index refresh set to 30s, auto-generated document id used, tuned workers, harvester_buffer_size & bulk_max_size.
Checked a few things to confirm that elastic nodes were not a bottleneck. Set replicas to 0 but no improvement which confirmed the same.
On further investigation, found that log level was set to info. Changed it to error and that helped somewhat because it was writing some info and one warning very frequently. But still the performance was not so good at~2-3k/s.
On further investigation, we found that it was writing thousands of lines to the registry every now and then. And whenever it would write, that is when the indexing rate would drop. So changed registry.flush to 60s and close_eof to true. That helped further. But still the performance was not so good at 4-5k/s.
Q1. Can someone help understand why was it writing thousands of lines to the registry? Even after changing the flush interval, it would not write for 60s, but then it would write for 20-30 seconds and the writes amounted to 50-80mb. I realized that it was something to do with the fact that we had placed ~30k files in the input folder, but we weren't sure which ones could be deleted to help bring down the registry.
Q2. Is there a way to know which files are uploaded, so that those could be moved/deleted?
Q3. Even after all files were uploaded, and we cleaned those up, the registry size did not reduce. We deleted the registry folder (like a hack). When will the registry size be reduced? But still the performance is poor. Any suggestions what else can we look for?
Q4. Instead of thousands of small files, believe less number of bigger files will help have a smaller registry, but will it help improve performance?