Improve performance of logstash data loading into elastic search

Hi,

I am trying to load data into elastic search using logstash, but loading time is not up to the mark.

I have file size of around 5GB and having 30 million records. its taking around 12 min to insert into elasticsearch. I am running elastic search on 5 docker instance each having 16 cores and 64 GB of RAM.

following is my logstash config file:

input {
file {
path => "/logstashdata/output.json"
codec => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
mode => "read"
}
}
output {
elasticsearch {
hosts => ["es01:9200","es02:9200","es03:9200"]
index => "testindex_1"
}
}

I have no filter to apply, and just pushing the raw json records to elasticsearch.

each elastic host is running with java heap config:

"ES_JAVA_OPTS=-Xms32g -Xmx32g"

and for logstash is having:

LS_JAVA_OPTS: "-Xmx16g -Xms16g"
LS_OPTS: "-w 10"

i need one file data in one index and i have multiple number( around 20 at one time instant) of log files, which needs to be pushed to elastic search.

for one 5 GB of file its taking around 12min to insert and if i give 4 parallel different pipeline conf to logstash then it takes around 22 min to insert all.

I tried increasing elastic instances, heap memory of both elastic and logstash but not able to reduce the time.

Please help what should be done to the reduce the ingestion time. does it require more hardware upgrade or any config changes.

and if there is any way to run multiple logstash having file as input plugin.

Hi @gourav_gupta,

not sure if Logstash or Filebeat will be faster. We have loaded significant amounts of JSON data into Elasticsearch from JSON text files but it is some time ago.

You can run several instances of both Filebeat and Logstash. Both can be started with something like

bin/logstash --path.config CONFIG_PATH

You can have one config file per JSON file. Logstash should be fine with defaults as it is just shipping pure JSON. And Elasticsearch should not really need a lot of resources just for indexing that amount of data.

I would look in both Logstash and Elasticsearch logs to see if there are any warnings or errors.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.