Logstash synced with MySQL taking to long to execute the queries

abhinavtyagi · April 3, 2024, 6:04am

Look at the below screenshot, you can see that while querying a table with just 146 records it's taking 6.2 seconds to execute.

Now, take a look at my configuration files for logstash:

base.conf:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    sql_log_level => "debug"  
    clean_run => true 
    record_last_run => false
    type => "txn"
    statement => "SELECT * FROM ac_transaction_dump"
  }

  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    sql_log_level => "debug"  
    clean_run => true 
    record_last_run => false
    type => "trial"
    statement => "SELECT * FROM ac_daily_trial_balance"
  }
}

filter {  
  mutate {
    remove_field => ["@version", "@timestamp"]
  }
}

output {

  stdout { codec => rubydebug { metadata => true } }

  if [type] == "txn" {
    elasticsearch {
      hosts => ["http://elasticsearch:9200"]
      data_stream => "false"
      index => "ac_transaction_dump"
      document_id => "%{transaction_dump_id}"
    }
  }

  if [type] == "trial" {
    elasticsearch {
      hosts => ["http://elasticsearch:9200"]
      data_stream => "false"
      index => "ac_daily_trial_balance"
      document_id => "%{daily_trial_balance_id}"
    }
  }
}

change.conf:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    type => "txn"
    use_column_value => true
    tracking_column => 'transaction_dump_id'
    last_run_metadata_path => "/usr/share/logstash/.logstash_jdbc_last_run_a'"
    sql_log_level => "debug"  
    schedule => "*/5 * * * * *"  
    statement => "
                  SELECT * 
                  FROM ac_transaction_dump 
                  WHERE (created_at > :sql_last_value)
                  OR (updated_at > :sql_last_value);
                "
  }

  jdbc {
    jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-8.0.22.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql_8:3306/pcs_accounts_db"
    jdbc_user => "pcs_db_user"
    jdbc_password => "laravel_db"
    type => "trial"
    use_column_value => true
    tracking_column => 'daily_trial_balance_id'
    last_run_metadata_path => "/usr/share/logstash/.logstash_jdbc_last_run_b"
    sql_log_level => "debug"  
    schedule => "*/5 * * * * *"  
    statement => "
                  SELECT * 
                  FROM ac_daily_trial_balance 
                  WHERE (created_at > :sql_last_value)
                  OR (updated_at > :sql_last_value);
                "
  }
}

filter {
    if [deleted_at] {
    mutate { 
        add_field => { "[@metadata][action]" => "delete" }
        }
    }
    mutate {
    remove_field => ["@version", "@timestamp"]
    }
}

#   stdout { codec => rubydebug { metadata => true } }
output {
    if [type] == "txn" {
        if [@metadata][action] == "delete" {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_transaction_dump"
            action => "delete"
            document_id => "%{transaction_dump_id}"
            }
        }
        else {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_transaction_dump"
            document_id => "%{transaction_dump_id}"
            }
        }
    }
    
    if [type] == "trial" {
        if [@metadata][action] == "delete" {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_daily_trial_balance"
            action => "delete"
            document_id => "%{daily_trial_balance_id}"
            }
        }
        else {
            elasticsearch {
            hosts => ["http://elasticsearch:9200"]
            index => "ac_daily_trial_balance"
            document_id => "%{daily_trial_balance_id}"
            }
        }
    }
}

I have two tables in the database, one is "ac_transaction_dump" with 204 records and another one is "ac_daily_trial_balance" with 146 records.

It would help me a lot if anyone can suggest me the changes needed for the improvement. And also why it's taking too much time with just a few records ?

dadoonet · April 3, 2024, 7:06am

What happen if you remove ?size=300?

abhinavtyagi · April 3, 2024, 7:23am

Thanks for the reply.

Same behavior, it's taking roughly 7 seconds when I execute the query for the first time and after that it's fast enough with average time 60ms.

Christian_Dahlqvist · April 3, 2024, 7:25am

What is the size and specification of the cluster? What type of storage are you using?

What load is the cluster under overall?

Which version of Elasticsearch are you using?

abhinavtyagi · April 3, 2024, 7:32am

Thanks for the reply Christian, below are the details:

Version: 8.12.2, Build: docker/48a287ab9497e852de30327444b0809e55d46466/2024-02-19T10:04:32.774273190Z, JVM: 21.0.2

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.19.0.3 40 98 4 1.14 1.73 2.53 cdfhilmrstw * b9a2397cb7f9

Christian_Dahlqvist · April 3, 2024, 7:39am

Are you continuously indexing into or updating the index? Are any other indices being updated or queried at the same time? If you are, the first search request may trigger a refresh which can be expensive and require a fair bit of processing, especially on very small and resource constrained clusters like yours. To see if this is the case you can try to issue a separate refresh request before you run the query and see if this makes any difference.

abhinavtyagi · April 3, 2024, 7:56am

No, I am not updating continuously. And I noticed that there is an automatic refresh happening.

Is there any way to change my cluster structure like you said "small and resource constrained clusters" or any changes at the implementation level for better performance ?

dadoonet · April 3, 2024, 8:00am

No. That's not the same thing. The thing you are seeing in Kibana is only Kibana "refreshing itself". It's not related to the index refresh operation @Christian_Dahlqvist mentioned.

So once you have inserted your data, you should trigger a refresh... It might not be as fast as you would imagine though because the OS has to get a first query to warm the OS cache.

I'm not sure about what you are exactly trying to solve here. I mean that, if the next queries are super fast, then I think you are good to go. What problem are you trying to solve?

abhinavtyagi · April 3, 2024, 8:11am

Here, problem is the "time it's taking to execute the queries".

Note that when I loaded the data, and after that even if I make any small change like changing a (row, column) value in my database table and then execute query to make sure if the change is reflected on the index. That execution is taking 7 to 8 seconds.

Sometimes, without any update it is taking 7 to 8 seconds.

That's the problem I am trying to solve.

Christian_Dahlqvist · April 3, 2024, 8:20am

Is the data you are using your expected production volume? I would recommend testing and optimising on as real a setup as possible with respect to data volumes and cluster size.

Did running a separate refresh request before the first query make any difference? If it does, you may want to make this part of your indexing process.

It could also be that Elasticsearch need to build internal data structures before the first search can be served, which can add latency. You could try to eable eager global ordinals in your mappings to address this and see if it helps. This would spread out this work over time.

dadoonet · April 3, 2024, 8:36am

Did you try to refresh the index first?

abhinavtyagi · April 3, 2024, 9:40am

I refreshed all the indices, but there is no improvement.

abhinavtyagi · April 3, 2024, 9:52am

No, it doesn't make any difference.

abhinavtyagi · April 3, 2024, 9:53am

Sorry, but here I am not getting you.

Christian_Dahlqvist · April 3, 2024, 9:54am

Is this a realistic test with the data volumes you are expecting, or is it a scaled down test?

abhinavtyagi · April 3, 2024, 10:01am

It's a scaled-down test.

Christian_Dahlqvist · April 3, 2024, 10:05am

Well, that may not be representative. If you are expecting larger data volumes you would likely have a larger cluster with more resources. This could potentially remove the resource constraint, behave very differently and eliminate the issue you are trying to solve. I would therefore recommend testing with the data volume you are expecting in production (with production level query and indexing load) and see if this is still an issue.

abhinavtyagi · April 3, 2024, 10:07am

Okay, thanks for your guidance.

system · May 1, 2024, 10:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
MySql Query is taking more to execute Logstash	15	1231	October 25, 2018
Logstash Elastic Search JDBC very slow Logstash	1	1120	October 26, 2017
How logstash is working? Logstash	3	2442	October 25, 2018
Increasing load on indices without any change in the database records Logstash docker	0	98	April 8, 2024
Terribly long time to load data from MySQL Logstash	10	944	October 16, 2018

Logstash synced with MySQL taking to long to execute the queries

Related topics