I am using elasticsearch 5.0.1 and logstash 5.0.0. For 1GB data, logstash is easily parsing the logs in half an hr (approx) if I am not redirecting the output to ES. With ES in output, the time taken to parse the same data is around 3.5-4 hours.
How do I reduce the bottleneck during the insertion of data in elasticsearch ???
What is the specification of your Elasticsearch cluster? Have you looked to identify what is limiting Elasticsearch performance? Is CPU saturated? Do you see a lot of IO wait? Is there evidence of a lot of GC in the logs?
Look at the operating system level to see if CPU is fully utilised. You only have 2 physical cores, which makes it likely that is the bottleneck. What is the size of your events? Is Logstash also running on the same host?
Unless you have exceptionally slow disk, I would expect CPU to be the limit if running both Logstash and Elasticsearch on the same host. Have you set the number of workers in the Logstash elasticsearch output as outlined in the performance troubleshooting guide? What type of data are you indexing? How large are the records?
As per the documentation of elasticsearch plugin, output workers are no longer supported.
Data contains logs in plain text. From each log, few key-value pairs are extracted using logstash filters.
Each log message can be between 100-1500 characters.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.