Hi,
I am trying to load data into elastic search using logstash, but loading time is not up to the mark.
I have file size of around 5GB and having 30 million records. its taking around 12 min to insert into elasticsearch. I am running elastic search on 5 docker instance each having 16 cores and 64 GB of RAM.
following is my logstash config file:
input {
file {
path => "/logstashdata/output.json"
codec => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
mode => "read"
}
}
output {
elasticsearch {
hosts => ["es01:9200","es02:9200","es03:9200"]
index => "testindex_1"
}
}
I have no filter to apply, and just pushing the raw json records to elasticsearch.
each elastic host is running with java heap config:
"ES_JAVA_OPTS=-Xms32g -Xmx32g"
and for logstash is having:
LS_JAVA_OPTS: "-Xmx16g -Xms16g"
LS_OPTS: "-w 10"
i need one file data in one index and i have multiple number( around 20 at one time instant) of log files, which needs to be pushed to elastic search.
for one 5 GB of file its taking around 12min to insert and if i give 4 parallel different pipeline conf to logstash then it takes around 22 min to insert all.
I tried increasing elastic instances, heap memory of both elastic and logstash but not able to reduce the time.
Please help what should be done to the reduce the ingestion time. does it require more hardware upgrade or any config changes.
and if there is any way to run multiple logstash having file as input plugin.