Bottleneck while inputting data into the elasticsearch

Hi,

I am using elasticsearch 5.0.1 and logstash 5.0.0. For 1GB data, logstash is easily parsing the logs in half an hr (approx) if I am not redirecting the output to ES. With ES in output, the time taken to parse the same data is around 3.5-4 hours.

How do I reduce the bottleneck during the insertion of data in elasticsearch ???

What is the specification of your Elasticsearch cluster? Have you looked to identify what is limiting Elasticsearch performance? Is CPU saturated? Do you see a lot of IO wait? Is there evidence of a lot of GC in the logs?

Hi Christian,

I am using the default settings of elasticsearch 5.0.1. In the logstash config file, I have the following setting in output plugin:

elasticsearch {
	hosts => ["localhost:9200"]
	index => "eaxmple-test1"
}

How do I check its performance (elasticsearch) ? Sorry, I am new to logstash and elasticsearch..

Logstash shows following config (default I guess):
{"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>500}

The machine on which I tested it has the following config:
Processor: Intel Core i3-4130 CPU @ 3.40GHz
RAM: 4 GB
System type: 64bit

Look at the operating system level to see if CPU is fully utilised. You only have 2 physical cores, which makes it likely that is the bottleneck. What is the size of your events? Is Logstash also running on the same host?

OS level performance:

  1. CPU : varying between 11%-35%
  2. Memory: stable at 69%
  3. Disk: Stable at 99%
  4. Network: 0%

Yes, logstash is also running on the same host.

FYI, 1 GB data that I mentioned in the post was divided into ~100 MB files.

What do you mean by 'size of the events' ??

Unless you have exceptionally slow disk, I would expect CPU to be the limit if running both Logstash and Elasticsearch on the same host. Have you set the number of workers in the Logstash elasticsearch output as outlined in the performance troubleshooting guide? What type of data are you indexing? How large are the records?

As per the documentation of elasticsearch plugin, output workers are no longer supported.

Data contains logs in plain text. From each log, few key-value pairs are extracted using logstash filters.
Each log message can be between 100-1500 characters.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.