Loading high transactional data to elasticsearch

amirhosseinbi · December 12, 2024, 2:00pm

Here's the mount options

/dev/mapper/vgamir-lvamir on /home/elk type ext4 (rw,relatime,data=ordered)

Yes I'm using a template below:

  "index": {
    "lifecycle": {
      "name": "7daysHot7daysWarm"
    },
    "codec": "best_compression",
    "refresh_interval": "15s",
    "number_of_replicas": "0"
  }
}

I have other pipelines running as well, but this pcap-gy has made the challenge for me. Docs of 2 days before couldn't be indexed yet. As I checked on average each 35 seconds only around 200K json docs is being processed, and just for test I have changed the output to /dev/null this average changed to 30 secs, so elasticsearch might not be the bottleneck.

BTW playing with the batch.size did not help me, should I keep 4096?

- pipeline.id: pcap_gy_json
  path.config: "/home/elk/logstash-8.12.2/conf.d/pcap-gy-json.conf"
  pipeline.batch.size: 4096
  pipeline.workers: 80

leandrojmp · December 12, 2024, 3:22pm

Well, you would need to share them as well, it is hard to troubleshoot something without the full context, until now we assumed that you had only one pipeline running on your logstash instance.

The pipelines are being executed in the same instance, everything can impact the performance.

One thing that I don't think was mentioned, what is the Logstash version you are using? Please share this information as well.

Yeah, this can be an issue, the file input is single threaded if I'm not wrong, and from my experience Logstash does not performs well when you have a huge amount of files in a path.

You should change relatime to noatime as relatime can impact performance and also decrease the life of your SSD disk, you can read more about it here.

This can also have some impact in performance, data=ordered, it would be best to use data=writeback, but I'm not sure if you can just change this in the mounting options.

You can read about the differences here on the Data Mode part.

Don't think you need to change this now.

RainTown · December 12, 2024, 10:15pm

Were you able to make a comparison on another logstash server, with output set to /dev/null, to get a feel for what logstash on a single server can do ith your data and your logstash configuration.

For iostat, I meant running it for an extended period, when the logstash importing is ongoing, outputting say every 10seconds, for a period of say 10 minutes, looking specifically at device sdd. "iostat -x 10".,

My concern here is you/we are not getting much closer to understanding if eladsticsearch or logstash or the storage is the bottleneck.

My hunch remains you would be better served by scaling things horizontally. I say that having never in my life ran any elastic tool, I did use for massive oracle or the SQL databases, on a system with such a lot of RAM.

Topic		Replies	Views
Tuning logstash and elasticsearch for loading data from oracle database Elasticsearch	13	2178	September 17, 2019
Bottleneck while inputting data into the elasticsearch Logstash	7	3295	December 29, 2016
Huge concurrent data ingestion to ElasticSearch Elasticsearch	16	2829	September 18, 2018
Ingestion performance issues - where to start? Elasticsearch	6	655	September 18, 2020
Elasticsearch indexing rate from Logstash Elasticsearch	6	969	July 7, 2018

Loading high transactional data to elasticsearch

Related topics