Elasticsearch/Logstash low indexing rate

urkole · June 12, 2020, 8:48am

Hello,

I'm setting up an Elasticsearch cluster in my laptop to do some testing before migrating to a better infrastructure, so I'm still learning how this works.

As I've said, I have an Elasticsearch instance running (one node, one shard, no replication). I'm reading log files from Logstash and outputting them to the ES instance with the most basic pipeline:

 input {
     file {
         mode => "read"
         path => "/Users/urko/tests/elastic/log-samples/workdata/*"
     }
 }

 output {
     elasticsearch {
         hosts => "http://localhost:9200"
         index => "test_raw_2"
     }
 }

My log file is just some sample Apache Httpd 10GB log file (136736170 lines). I've noticed that when ingesting this data into ES, the ingestion rate is really low, as it takes around 5h before it finishes. I've also tried splitting the file into 10 1GB files to see wether that could speed things up, but that didn't help.

My specs

Laptop

MacBook Pro (Early 2015)
3.1 GHz Dual-Core Intel Core i7
16 GB 1867 MHz DDR3
macOS v10.15.4

Elasticsearch configuration

I didn't specify anything in particular except the JVM heap size, as I read that it's important. It has 8GB assigned, which is half of my systems total memory.

Logstash configuration

After reading logstash.yml and Performance Troubleshooting I've tried with multiple heap sizes (2GB and 4GB), and I've found no difference performance-wise.

Question

So I'm guessing that 2GB/h is a slow ingestion rate and that something in my setup is wrong. I've noticed (based on the number of documents in my index) that it gets slower the longer the process has been running (maybe some GC issue? no idea).

Elasticsearch and Logstash outputs show no errors or warnings, and I haven't experienced any crash either.

Can I get any help? What's making this run so slow? Is Logstash not supposed to run alongside Elasticsearch?

I can provide additional information if required.

Thanks in advance.

urkole · June 12, 2020, 2:57pm

After doing some tampering with the JVM configuration (both Elasticsearch's and Logstash's) and trying multiple configuration, I can't see any improvement and I don't think my hardware is the cause of the bottleneck. It's still taking around 30min/GB and the CPU and Heap usage seem Ok to me (see images below).

So, what's causing the ingestion rate to be so slow? Maybe some configuration that I'm unaware of?

Christian_Dahlqvist · June 12, 2020, 3:00pm

With Elasticsearch the bottleneck is often disk I/O and disk utilisation. Are you monitoring this?

urkole · June 15, 2020, 6:53am

I don't think that's the case either. Here's the output from iostat -w 3 during the process: link.

The MB/s column is around 10MB/s most of the time, and sometimes goes up to 100MB/s~. I've seen this at around 400 when doing a cp operation on large files.

Christian_Dahlqvist · June 15, 2020, 7:02am

Can you please run iostat -x so we can see disk utilization and iowait?

urkole · June 15, 2020, 8:19am

MacOS iostat doesn't have a -x option.

Anyways, I've already finished my tests and was just asking to see wether I was doing something wrong.

Should I mark this issue as solved? If so, which answer is the solution?

system · July 13, 2020, 8:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can I debug what influence the write-in speed from logstash:input:file to es? Logstash	8	880	December 26, 2016
Ingestion performance issues - where to start? Elasticsearch	6	746	September 18, 2020
Troubleshoot "low" Logstash -> ES indexing rate Elasticsearch	17	2469	April 13, 2017
Slow Data loading to elasticsearch Logstash	15	5307	July 13, 2017
How to gain ingestion rate Elasticsearch	15	5998	July 5, 2017