Hi, I am trying filebeat output to elasticsearch. And have something not clear.
And I found that there are almost 7 seconds among filebeat publish to elasticsearch and elasticsearch API can find the doc. (Filebeat just sent two logs.)
What had happened during these times. Can we reduce these times spent?
Hi @jsoriano, thanks for your reply
I was running filebeat in debug mode and count the time when the publish info print out, and tried call api to get the doc. Test for server times and it was about to 7 seconds. Btw , I saw that the time record of "read_timestamp" and "@timestamp" , also have about 7 seconds period. And for your suggest,
There are some settings in filebeat that can control how files are scanned, depending on how files are read, there can be some lag.
For this, I had already set the scan_frenquency to 1S, Does it still has others setting can help?
Once the line is read, events are kept in an internal queue, so they can be sent in batches.
Do you mean "max_bulk_size", I was not setting this, so it should be the default 50, right?
And I just sent two logs in my test case. Do you mean that the log add into queue and wait for a period then sent to elasticsearch? Can you explain when will filebeat sent out the log?
Once the event is received by elasticsearch, it has to refresh the index, beats indexes are configured to be refreshed every 5 seconds.
Yes. But as I just sent out 2 logs. I think this stage will fast enough, am I right?
Take a look to log input options, you might try to customize the backoff settings. In any case I don't thing this will affect your observed time, as you are counting the time since the publish info is printed.
The process is done every 5 seconds, no matter the number of logs. To check if this is what is increasing your times, try to reduce the refresh_interval for the index you are writing to. With a query like this one:
Hi @jsoriano, thanks for your suggestion. I think these should help. But I just found something strange,
and I think I should resolve it first.
I access my nginx web, and there is a log generated in nginx access log. Then I can see filebeate publish a log event. But at the end, I can found two same doc inside elasticsearch. Seems elasticsearch do a copy and regard it as new doc..
Any idea?
My filebeat version is 6.3.2, and my config:
filebeat.yml
Sorry @jsoriano. Ignore the elasticsearch repeat log. It is my fault, someone start the filebeat service while I running the filebeat debug. So there are two filebeat running at the same time.
Back to the time spent problem. I am sorry but after I setting the fulsh.min_events , flush.timemout, and refresh_interval . I had tried setup the setting and test one by one, but still no luck.
Sometime it could fast(less than 1s), but sometime is also need 7 seconds or more.
Seems scan_frequency not work perfectly after I had those setup.
Sometime it can scan the log file changed immediately and publish, at this case, the api can find the new document very fast.
Sometime it kept scan the log file but can not realize the file changed, so it can not publish the doc at time, at this case, the api will have some scends delay.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.