How to increase output efficiency in logstash to elastic search?

I have a logstash configuration that has input for postgres database table that has 14 lakh records and an output to elasticsearch database that in setup on ec2 machine c5 x large, when i see documents formation for that index in kibana also in the same machine of ec2 i find that the output speed is quite low
i have 4 worker nodes 250 is the batch size of the logstash.yml

How do you know it is the output to Elasticsearch and not the input from PostgreSQL that is the bottleneck? I would recommend configuring Logstash to perform small writes to file instead of to Elasticsearch and see if this is indeed faster than what you currently see. Once we know whether that is faster or not we can tune the appropriate portion of the pipeline.

HI Christian I have an AWS Postgres RDS instance and I have setup and an aws ec2 instance c5 x large type and i have setup my logtstash and elasticsearch on the same instance ,

input {

jdbc {

   jdbc_driver_class => "org.postgresql.Driver"

   jdbc_driver_library => "/home/ubuntu/postgresql-42.2.24.jar"

   jdbc_connection_string => "jdbc:postgresql://XXXXXXXX:5432/XXXXXXXXXXX"

   jdbc_validate_connection => true

   jdbc_user => "XXXXXXXXXXXX"

   jdbc_password => "XXXXXXXXXXXXXXXX"

   schedule => "05 12 * * *"

   statement => "select * from breakfix where breakfix_id > :sql_last_value order by breakfix_id asc"

   tracking_column => "breakfix_id"

   tracking_column_type => "numeric"

   use_column_value => true

   last_run_metadata_path => "/mnt/.logstash_jdbc_last_run"

}
}
filter {

mutate {
remove_field => ["@version","@timestamp"]
}

}
output { stdout { codec => rubydebug }

elasticsearch {

   hosts => ["http://13.233.180.173:9200"]
       index => "breakfix2"
       document_id => "%{breakfix_id}"
       doc_as_upsert => true

}

}

following the logstash configuration

and I have 4 workers
250 batch size
10 is the delay

and on elasticsearch i just have one shard ,still can figure out what is causing delay?

I have relational table that has 14 lakh records and that needs to be send to elasticsearch

Did you read my previous comment? Please remove the stdout and elasticsearch outputs and instead configure a file output, possibly just writing a field per record from the database. See what throughput this gives and compare it to when you output to Elasticsearch. Once done, please post the the throughput results here. That should give us an idea where to start troubleshooting.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.