How elasticsearch will fetch data from MySQL database?

mamta · July 13, 2018, 1:25pm

Hi,

Suppose I have MySQL database and I connected with this database through ELK. In MySQL database have 1 lac records and that all data is shown in kibana. After sometimes mysql updated with 50k records. So my question is how ELK fetch data from MySQL? It traverses all the MySQL records or fetches data in incremental manners (only latest records). If it again travesrse all the records so that times MySQL database will get more load and maybe it will go slow.

Please help me about this queries.

Thank you.

dadoonet · July 13, 2018, 2:14pm

It depends on how you configured Logstash JDBC input (supposing that you are using that).

I prefer having a direct connection from the application to reduce load on the database and have a more real time approach.

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/

mamta · July 16, 2018, 5:33am

Hi David Pilato,

Thank you so much for your informative reply.
I have done direct connection with MySQL DB. I am sending you logstash conf file.
Please check it.

logstash.conf

input {
jdbc {
jdbc_driver_library => "/home/catadm1/mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://10.0.0.5:3306/abc?autoReconnect=true"
jdbc_user => "abc"
jdbc_password => "abc@1234"
jdbc_paging_enabled => "true"
jdbc_page_size => "5000000"
schedule => "* * * * *"
statement => "SELECT LOG_ID,MSG_TYPE,INTERFACE_NAME,BROKER_NAME,MSG_FLOW_NAME,SOURCE_NAME,TARGET_NAME,LOG_PAYLOAD_ID,EXCEPTION_PAYLOAD_ID,RESULT from TRANSACTION_LOG"
use_column_value => true
tracking_column => "%{LOG_ID}"
clean_run => true
}
}
filter {
  grok { match => [ "message", "%{GREEDYDATA:message}" ] }
}

output {
stdout {codec => json_lines}
elasticsearch {
hosts => ["localhost:9200"]
index => "esbdb_tables"
document_type => "test_elk_001"
document_id => "%{log_id}"
}
# file {
# path =>"/data/applications/tools/logstash-2.0.0/logs/test.log"
# }
}

dadoonet · July 16, 2018, 7:12am

I have done direct connection with MySQL DB

That's not what I meant. I meant direct connection from your application to elasticsearch. I did not mean reading the data later with Logstash and send to elasticsearch.

The former is "real time". The later is not.

Next time please format your code according to the guide (read the About the Elasticsearch category). I'm editing your post. But your indentation is wrong which makes harder to read your config.

Then, I'm moving your question to logstash as it's a logstash question.

mamta · July 16, 2018, 7:43am

Thank you @dadoonet.

I thought that logstash.conf will help you to understand my question.

magnusbaeck · July 16, 2018, 8:04am

tracking_column => "%{LOG_ID}"

This is wrong. Use LOG_ID and not %{LOG_ID} if you want it to use the LOG_ID column for tracking.

However, this won't do any good unless you add a condition to your SELECT clause to restrict the query from returning rows older than the recorded value of the LOG_ID column. See the jdbc input documentation for examples.

filter {
grok { match => [ "message", "%{GREEDYDATA:message}" ] }
}

This filter doesn't do anything useful.

mamta · July 16, 2018, 8:16am

Hi Magnus Bäck,

Thank you for the reply.

First I have to remove the grok filter. In my conf file, I have used only select statement without any conditions. What will be the best way to not giving so much load on MySQL database. I want to setup Elasticsearch in such a way that it can fetch data in an incremental manner without fetching mysql database again and again. Suppose when new entries will come in mysql database so that time Elasticsearch will fetch those records only.

Thank you.

magnusbaeck · July 16, 2018, 6:34pm

This is what the sql_last_value SQL parameter is used for. After each query execution Logstash records a column value from the last processed row and when the query runs the next time that value will be put in the sql_last_value parameter. Use that parameter in your query to only fetch values that are more recent. Obviously, the column you use for this purpose must be a "last modified" timestamp or something else that's ever increasing.

Again, this is explained (with examples) in the jdbc input documentation.

system · August 13, 2018, 6:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash fetching data from multiple MySQL databases Logstash jdbc	1	361	November 16, 2023
Updated data from jdbc(mysql) to elasticsearch by using logstash Logstash	2	1836	April 14, 2018
Load data from mysql to elasticsearch with logstash Logstash	1	371	May 16, 2020
Mysql server to elasticsearch Elasticsearch	5	794	March 5, 2019
How logstash is working? Logstash	3	2440	October 25, 2018

How elasticsearch will fetch data from MySQL database?

Related topics