Logstash-Filter-Elasticsearch Slow

Hi,

I am getting slow performance when using logstash-filter-elasticsearch. Even with small volumes of data. I've tried this in multiple versions from 5.3 up to 6.0 of both ES & LS.

By way of example here is a config file:

input {
        jdbc {
        jdbc_connection_string => "jdbc:mysql://mydb"
                jdbc_user => "myuser"
                jdbc_password => "mypassword"
                jdbc_driver_library => "mysql-connector-java-5.1.42-bin.jar"
                jdbc_driver_class => "com.mysql.jdbc.Driver"
                    statement =>

		"SELECT * FROM table where id > :sql_last_value limit 250"

		tracking_column => id
		use_column_value => true
		record_last_run => true
		last_run_metadata_path => "views.txt"
    }
}

filter{
  elasticsearch {
    hosts => "https://URL:9243" 
    index => "reference"
    user => "elastic"
    password => "password"
    query => "product_code:%{[media_id]}"
    fields => [["aaa","aaa"]]
  }

filter {
           date {
                 match => ["purchase_time","YYYY-MM-dd HH:mm:ss"]
                target => "@timestamp"
           }
}


output {
  elasticsearch {
  hosts => "https://URL:9243"
  index => "myindex"
  user => "elastic"
  password => "password"
  document_type => "media"
  }

}

My ES Reference index only has a few thousand records & only 15 columns.

I was running on ES Cloud & Local Cluster (huge environments - nothing running) and observed slowness. Any ideas how i'd go about debugging.

For Local testing I ran this on my Mac Book Pro which Intel Core I7 2.7GHz 16GB Ram and SSD with macOS Sierra 10.12.6. Java Versions:

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

For ES Cloud I tried this on 5.6.4 and 6.0

Thanks
Wayne

What does "slow" mean? How many events can you process each second?

Slow defined as 500 documents ingest in a few minutes when it can load hundreds of thousands without it in a minute

That is slow, yes, but keep in mind that each event leads to an ES query that easily requires an order of milliseconds so it'll never be terribly high-performant. I'd look into the performance metrics of ES and Logstash to what's taking so much time. Is it ES itself or does Logstash add overhead? Also, have you tried raising the number of pipeline workers? Since your elasticsearch filter add latency from an external resource I'd expect Logstash to be idling a lot of the time so you should be able to raise the concurrency quite a bit.

It looks like you are querying a cloud instance from your laptop. Have you tried running Logstash closer to Elasticsearch to minimise latency?

Hi @magnusbaeck , increasing workers is a little faster yes . I'll add in some metrics per request.

@Christian_Dahlqvist - I've tried on a local ES cluster too and same issue. Thanks for responding though.

Wayne

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.