Bottleneck while inputting data into the elasticsearch

Mayank_Agrawal · December 1, 2016, 8:58am

Hi,

I am using elasticsearch 5.0.1 and logstash 5.0.0. For 1GB data, logstash is easily parsing the logs in half an hr (approx) if I am not redirecting the output to ES. With ES in output, the time taken to parse the same data is around 3.5-4 hours.

How do I reduce the bottleneck during the insertion of data in elasticsearch ???

Christian_Dahlqvist · December 1, 2016, 9:32am

What is the specification of your Elasticsearch cluster? Have you looked to identify what is limiting Elasticsearch performance? Is CPU saturated? Do you see a lot of IO wait? Is there evidence of a lot of GC in the logs?

Mayank_Agrawal · December 1, 2016, 9:38am

Hi Christian,

I am using the default settings of elasticsearch 5.0.1. In the logstash config file, I have the following setting in output plugin:

elasticsearch {
	hosts => ["localhost:9200"]
	index => "eaxmple-test1"
}

How do I check its performance (elasticsearch) ? Sorry, I am new to logstash and elasticsearch..

Logstash shows following config (default I guess):
{"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>500}

The machine on which I tested it has the following config:
Processor: Intel Core i3-4130 CPU @ 3.40GHz
RAM: 4 GB
System type: 64bit

Christian_Dahlqvist · December 1, 2016, 9:54am

Look at the operating system level to see if CPU is fully utilised. You only have 2 physical cores, which makes it likely that is the bottleneck. What is the size of your events? Is Logstash also running on the same host?

Mayank_Agrawal · December 1, 2016, 10:19am

OS level performance:

CPU : varying between 11%-35%
Memory: stable at 69%
Disk: Stable at 99%
Network: 0%

Yes, logstash is also running on the same host.

FYI, 1 GB data that I mentioned in the post was divided into ~100 MB files.

What do you mean by 'size of the events' ??

Christian_Dahlqvist · December 1, 2016, 10:35am

Unless you have exceptionally slow disk, I would expect CPU to be the limit if running both Logstash and Elasticsearch on the same host. Have you set the number of workers in the Logstash elasticsearch output as outlined in the performance troubleshooting guide? What type of data are you indexing? How large are the records?

Mayank_Agrawal · December 1, 2016, 10:56am

As per the documentation of elasticsearch plugin, output workers are no longer supported.

Data contains logs in plain text. From each log, few key-value pairs are extracted using logstash filters.
Each log message can be between 100-1500 characters.

system · December 29, 2016, 10:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loading high transactional data to elasticsearch Logstash	22	69	December 12, 2024
ELK - Time taking to parse Elasticsearch	18	1349	August 24, 2017
Finding bottleneck in pipeline Logstash	9	1530	March 1, 2022
Slow Data loading to elasticsearch Logstash	15	5227	July 13, 2017
How to tune Logstash for ES indexing speed Logstash	3	1411	October 4, 2018

Bottleneck while inputting data into the elasticsearch

Related topics