Logstash synced with MySQL taking to long to execute the queries

Look at the below screenshot, you can see that while querying a table with just 146 records it's taking 6.2 seconds to execute.

Now, take a look at my configuration files for logstash:

base.conf:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    sql_log_level => "debug"  
    clean_run => true 
    record_last_run => false
    type => "txn"
    statement => "SELECT * FROM ac_transaction_dump"
  }

  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    sql_log_level => "debug"  
    clean_run => true 
    record_last_run => false
    type => "trial"
    statement => "SELECT * FROM ac_daily_trial_balance"
  }
}

filter {  
  mutate {
    remove_field => ["@version", "@timestamp"]
  }
}

output {

  stdout { codec => rubydebug { metadata => true } }

  if [type] == "txn" {
    elasticsearch {
      hosts => ["http://elasticsearch:9200"]
      data_stream => "false"
      index => "ac_transaction_dump"
      document_id => "%{transaction_dump_id}"
    }
  }

  if [type] == "trial" {
    elasticsearch {
      hosts => ["http://elasticsearch:9200"]
      data_stream => "false"
      index => "ac_daily_trial_balance"
      document_id => "%{daily_trial_balance_id}"
    }
  }
}

change.conf:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    type => "txn"
    use_column_value => true
    tracking_column => 'transaction_dump_id'
    last_run_metadata_path => "/usr/share/logstash/.logstash_jdbc_last_run_a'"
    sql_log_level => "debug"  
    schedule => "*/5 * * * * *"  
    statement => "
                  SELECT * 
                  FROM ac_transaction_dump 
                  WHERE (created_at > :sql_last_value)
                  OR (updated_at > :sql_last_value);
                "
  }

  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    type => "trial"
    use_column_value => true
    tracking_column => 'daily_trial_balance_id'
    last_run_metadata_path => "/usr/share/logstash/.logstash_jdbc_last_run_b"
    sql_log_level => "debug"  
    schedule => "*/5 * * * * *"  
    statement => "
                  SELECT * 
                  FROM ac_daily_trial_balance 
                  WHERE (created_at > :sql_last_value)
                  OR (updated_at > :sql_last_value);
                "
  }
}

filter {
    if [deleted_at] {
    mutate { 
        add_field => { "[@metadata][action]" => "delete" }
        }
    }
    mutate {
    remove_field => ["@version", "@timestamp"]
    }
}

#   stdout { codec => rubydebug { metadata => true } }
output {
    if [type] == "txn" {
        if [@metadata][action] == "delete" {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_transaction_dump"
            action => "delete"
            document_id => "%{transaction_dump_id}"
            }
        }
        else {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_transaction_dump"
            document_id => "%{transaction_dump_id}"
            }
        }
    }
    
    if [type] == "trial" {
        if [@metadata][action] == "delete" {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_daily_trial_balance"
            action => "delete"
            document_id => "%{daily_trial_balance_id}"
            }
        }
        else {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_daily_trial_balance"
            document_id => "%{daily_trial_balance_id}"
            }
        }
    }
}

I have two tables in the database, one is "ac_transaction_dump" with 204 records and another one is "ac_daily_trial_balance" with 146 records.

It would help me a lot if anyone can suggest me the changes needed for the improvement. And also why it's taking too much time with just a few records ?

What happen if you remove ?size=300?

Thanks for the reply.

Same behavior, it's taking roughly 7 seconds when I execute the query for the first time and after that it's fast enough with average time 60ms.

What is the size and specification of the cluster? What type of storage are you using?

What load is the cluster under overall?

Which version of Elasticsearch are you using?

Thanks for the reply Christian, below are the details:

Version: 8.12.2, Build: docker/48a287ab9497e852de30327444b0809e55d46466/2024-02-19T10:04:32.774273190Z, JVM: 21.0.2

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.19.0.3 40 98 4 1.14 1.73 2.53 cdfhilmrstw * b9a2397cb7f9

Are you continuously indexing into or updating the index? Are any other indices being updated or queried at the same time? If you are, the first search request may trigger a refresh which can be expensive and require a fair bit of processing, especially on very small and resource constrained clusters like yours. To see if this is the case you can try to issue a separate refresh request before you run the query and see if this makes any difference.

1 Like

No, I am not updating continuously. And I noticed that there is an automatic refresh happening.

image

image

Is there any way to change my cluster structure like you said "small and resource constrained clusters" or any changes at the implementation level for better performance ?

No. That's not the same thing. The thing you are seeing in Kibana is only Kibana "refreshing itself". It's not related to the index refresh operation @Christian_Dahlqvist mentioned.

So once you have inserted your data, you should trigger a refresh... It might not be as fast as you would imagine though because the OS has to get a first query to warm the OS cache.

I'm not sure about what you are exactly trying to solve here. I mean that, if the next queries are super fast, then I think you are good to go. What problem are you trying to solve?

Here, problem is the "time it's taking to execute the queries".

Note that when I loaded the data, and after that even if I make any small change like changing a (row, column) value in my database table and then execute query to make sure if the change is reflected on the index. That execution is taking 7 to 8 seconds.

Sometimes, without any update it is taking 7 to 8 seconds.

That's the problem I am trying to solve.

Is the data you are using your expected production volume? I would recommend testing and optimising on as real a setup as possible with respect to data volumes and cluster size.

Did running a separate refresh request before the first query make any difference? If it does, you may want to make this part of your indexing process.

It could also be that Elasticsearch need to build internal data structures before the first search can be served, which can add latency. You could try to eable eager global ordinals in your mappings to address this and see if it helps. This would spread out this work over time.

Did you try to refresh the index first?

I refreshed all the indices, but there is no improvement.

No, it doesn't make any difference.

Sorry, but here I am not getting you.

Is this a realistic test with the data volumes you are expecting, or is it a scaled down test?

It's a scaled-down test.

Well, that may not be representative. If you are expecting larger data volumes you would likely have a larger cluster with more resources. This could potentially remove the resource constraint, behave very differently and eliminate the issue you are trying to solve. I would therefore recommend testing with the data volume you are expecting in production (with production level query and indexing load) and see if this is still an issue.

1 Like

Okay, thanks for your guidance.