Hello,
I'm setting up an Elasticsearch cluster in my laptop to do some testing before migrating to a better infrastructure, so I'm still learning how this works.
As I've said, I have an Elasticsearch instance running (one node, one shard, no replication). I'm reading log files from Logstash and outputting them to the ES instance with the most basic pipeline:
input {
file {
mode => "read"
path => "/Users/urko/tests/elastic/log-samples/workdata/*"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "test_raw_2"
}
}
My log file is just some sample Apache Httpd 10GB log file (136736170 lines). I've noticed that when ingesting this data into ES, the ingestion rate is really low, as it takes around 5h before it finishes. I've also tried splitting the file into 10 1GB files to see wether that could speed things up, but that didn't help.
My specs
Laptop
MacBook Pro (Early 2015)
3.1 GHz Dual-Core Intel Core i7
16 GB 1867 MHz DDR3
macOS v10.15.4
Elasticsearch configuration
I didn't specify anything in particular except the JVM heap size, as I read that it's important. It has 8GB assigned, which is half of my systems total memory.
Logstash configuration
After reading logstash.yml and Performance Troubleshooting I've tried with multiple heap sizes (2GB and 4GB), and I've found no difference performance-wise.
Question
So I'm guessing that 2GB/h is a slow ingestion rate and that something in my setup is wrong. I've noticed (based on the number of documents in my index) that it gets slower the longer the process has been running (maybe some GC issue? no idea).
Elasticsearch and Logstash outputs show no errors or warnings, and I haven't experienced any crash either.
Can I get any help? What's making this run so slow? Is Logstash not supposed to run alongside Elasticsearch?
I can provide additional information if required.
Thanks in advance.