I had requirement like, need to run two instances of filebeat and both instances will load data to a particular Index.
Is it possible to achieve the same if yes could you please let us know how we can do the same.
The purpose is to make faster data load to the Index.
You can run multiple instances of Filebeat in the same server, you just need to make sure that the configuration, config path, data path and log path are different.
If you installed using rpm or deb you will need to create anoter systemd service and edit it to change the configuration and the paths.
Can you provide more context about this? Are you having issues loading data? If the issue is on the Elasticsearch side, having multiple filebeat instances will not solve the problem and it may even make it worst.
@leandrojmp Thanks for your reply.
The data path you are referring to the location which we are configuring in "elasticsearch.yml" file. If yes then why we need to configure filebeat yml file and if no then what value should given to path data value in the filebeat yml file.
Currently the index load rate is 5k/s and we want to increase the index load rate almost doubled since we need to load around 1billion of recs. This is the reason we are trying to configured tow instance of filebeat.
No, it is unrelated to Elasticsearch, it is the filebeat data path.
To run 2 filebeat instances you need to configure a different data path, logs and config for each one, here you can check the directory layout.
But have you made any troubleshoot to know if this index rate of 5 k/s is a filebeat issue and not an Elasticsearch issue?
Most of the cases the index rate is limited by the receiving side, not the sending side.
Have you tried to change some filebeat settings already? This may help as well.
There are some settings that you can try to tweak to see if the performance of filebeat increases, mainly the bulk_max_size, which has the default of 50, and worker, which has the default of 1.
Sometimes just increasing the bulk_max_size you will see a performance gain, for example, try to change it in your output to something like 125, and set the worker to 2.
Yes I had made "bulk_max_size" and "worker" as below in filebeat.yml file.
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["https://xx.xx.xx.xx:9200","https://xx.xx.xx.xx:9200"]
worker: 8
bulk_max_size: 3000
# Protocol - either `http` (default) or `https`.
protocol: "https"
How to troubleshoot this part because I am new to this Elastic . As I see in the dashboard it is showing 5k/s so I thought configuring two filebeat instance will double the speed of load.
Elasticsearch indexing rate depending on multiple factors, both from Elasticsearch itself and the client send logs, so it is not easy to find where is the bottleneck.
There is this article from Elastic with some tips on how to tune Elasticsearch for indexing speed, I recommend that you read it and check if you can apply some of them to your cluster.
Also, did you read the blog post about tuning filebeat? It has an example that shows that some times more workers/bulk_max_size is not always better and can have a negative impact.
You shared that you are using 8 workers and a bulk_max_size of 3000, how did you arrive at this number? This may already be too much for your cluster.
The linked post explains how to find the optimal numbers for your cluster.
I don't think that adding a second Filebeat will improve anything, also it depends on how filebeat is reading your logs, you didn't share it, but if you are going to use two instances, you need to make sure that they do not read the same files or you may endup with duplicates.
Thanks @leandrojmp for the inputs. Am from the same team as @Debasis_Mallick, so complementing to his updates.
We ran filebeat from 2 different servers and the ingest rate almost doubled, that helped us conclude that receiving side is not a bottleneck.
We had reviewed the article as well as the blog post. Based on same, we increased the refresh interval to 1 min and then to 5 min, that helped a little. We disabled replicas because this was kind of initial load, but that didn't seem to help much so we reverted. We haven't disabled swapping but we are monitoring the host stats and don't see that as an issue.
Based on the blog, we changed filebeat settings in a step-by-step manner. So we increased the workers to 2, then 4 and then 8. Similarly, we increased the bulk_max_size in a gradual way. It helped substantially.
We see that the host resources are not a bottleneck and hence were curious to understand if there is any more scope in tuning the single instance. If the first instance has hit its limits, then we consider a second instance, either on the same host or on another host.
Pls let us know if you have any further pointers. Thanks again!
I don't have any feedback besides what are already in the shared blog posts about tunning filebeat and elasticsearch.
In this case, I think that would be better to have instances in different servers if you can split your source files between them.
Adding another instance on the same service will basically double the I/O usage of the server because you will have 2 filebeat reading and writing on it, also it is a little more complicated to manage because you cannot have the 2 instances reading from the same source and would need to make sure that all the path used by filebeat are different.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.