Elasticsearch/Logstash low indexing rate

Hello,

I'm setting up an Elasticsearch cluster in my laptop to do some testing before migrating to a better infrastructure, so I'm still learning how this works.

As I've said, I have an Elasticsearch instance running (one node, one shard, no replication). I'm reading log files from Logstash and outputting them to the ES instance with the most basic pipeline:

 input {
     file {
         mode => "read"
         path => "/Users/urko/tests/elastic/log-samples/workdata/*"
     }
 }

 output {
     elasticsearch {
         hosts => "http://localhost:9200"
         index => "test_raw_2"
     }
 }

My log file is just some sample Apache Httpd 10GB log file (136736170 lines). I've noticed that when ingesting this data into ES, the ingestion rate is really low, as it takes around 5h before it finishes. I've also tried splitting the file into 10 1GB files to see wether that could speed things up, but that didn't help.

My specs

Laptop

MacBook Pro (Early 2015)
3.1 GHz Dual-Core Intel Core i7
16 GB 1867 MHz DDR3
macOS v10.15.4

Elasticsearch configuration

I didn't specify anything in particular except the JVM heap size, as I read that it's important. It has 8GB assigned, which is half of my systems total memory.

Logstash configuration

After reading logstash.yml and Performance Troubleshooting I've tried with multiple heap sizes (2GB and 4GB), and I've found no difference performance-wise.

Question

So I'm guessing that 2GB/h is a slow ingestion rate and that something in my setup is wrong. I've noticed (based on the number of documents in my index) that it gets slower the longer the process has been running (maybe some GC issue? no idea).

Elasticsearch and Logstash outputs show no errors or warnings, and I haven't experienced any crash either.

Can I get any help? What's making this run so slow? Is Logstash not supposed to run alongside Elasticsearch?

I can provide additional information if required.

Thanks in advance.

After doing some tampering with the JVM configuration (both Elasticsearch's and Logstash's) and trying multiple configuration, I can't see any improvement and I don't think my hardware is the cause of the bottleneck. It's still taking around 30min/GB and the CPU and Heap usage seem Ok to me (see images below).

So, what's causing the ingestion rate to be so slow? Maybe some configuration that I'm unaware of?

With Elasticsearch the bottleneck is often disk I/O and disk utilisation. Are you monitoring this?

I don't think that's the case either. Here's the output from iostat -w 3 during the process: link.

The MB/s column is around 10MB/s most of the time, and sometimes goes up to 100MB/s~. I've seen this at around 400 when doing a cp operation on large files.

Can you please run iostat -x so we can see disk utilization and iowait?

MacOS iostat doesn't have a -x option.

Anyways, I've already finished my tests and was just asking to see wether I was doing something wrong.

Should I mark this issue as solved? If so, which answer is the solution?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.