How elasticsearch will fetch data from MySQL database?


(Mamta ) #1

Hi,

Suppose I have MySQL database and I connected with this database through ELK. In MySQL database have 1 lac records and that all data is shown in kibana. After sometimes mysql updated with 50k records. So my question is how ELK fetch data from MySQL? It traverses all the MySQL records or fetches data in incremental manners (only latest records). If it again travesrse all the records so that times MySQL database will get more load and maybe it will go slow.

Please help me about this queries.

Thank you.


(David Pilato) #2

It depends on how you configured Logstash JDBC input (supposing that you are using that).

I prefer having a direct connection from the application to reduce load on the database and have a more real time approach.

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/


(Mamta ) #3

Hi David Pilato,

Thank you so much for your informative reply.
I have done direct connection with MySQL DB. I am sending you logstash conf file.
Please check it.

logstash.conf

input {
jdbc {
jdbc_driver_library => "/home/catadm1/mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://10.0.0.5:3306/abc?autoReconnect=true"
jdbc_user => "abc"
jdbc_password => "abc@1234"
jdbc_paging_enabled => "true"
jdbc_page_size => "5000000"
schedule => "* * * * *"
statement => "SELECT LOG_ID,MSG_TYPE,INTERFACE_NAME,BROKER_NAME,MSG_FLOW_NAME,SOURCE_NAME,TARGET_NAME,LOG_PAYLOAD_ID,EXCEPTION_PAYLOAD_ID,RESULT from TRANSACTION_LOG"
use_column_value => true
tracking_column => "%{LOG_ID}"
clean_run => true
}
}
filter {
  grok { match => [ "message", "%{GREEDYDATA:message}" ] }
}

output {
stdout {codec => json_lines}
elasticsearch {
hosts => ["localhost:9200"]
index => "esbdb_tables"
document_type => "test_elk_001"
document_id => "%{log_id}"
}
# file {
# path =>"/data/applications/tools/logstash-2.0.0/logs/test.log"
# }
}

(David Pilato) #4

I have done direct connection with MySQL DB

That's not what I meant. I meant direct connection from your application to elasticsearch. I did not mean reading the data later with Logstash and send to elasticsearch.

The former is "real time". The later is not.

Next time please format your code according to the guide (read the About the Elasticsearch category). I'm editing your post. But your indentation is wrong which makes harder to read your config.

Then, I'm moving your question to #logstash as it's a logstash question.


(Mamta ) #5

Thank you @dadoonet.

I thought that logstash.conf will help you to understand my question.


(Magnus Bäck) #6

tracking_column => "%{LOG_ID}"

This is wrong. Use LOG_ID and not %{LOG_ID} if you want it to use the LOG_ID column for tracking.

However, this won't do any good unless you add a condition to your SELECT clause to restrict the query from returning rows older than the recorded value of the LOG_ID column. See the jdbc input documentation for examples.

filter {
grok { match => [ "message", "%{GREEDYDATA:message}" ] }
}

This filter doesn't do anything useful.


(Mamta ) #7

Hi Magnus Bäck,

Thank you for the reply.

First I have to remove the grok filter. In my conf file, I have used only select statement without any conditions. What will be the best way to not giving so much load on MySQL database. I want to setup Elasticsearch in such a way that it can fetch data in an incremental manner without fetching mysql database again and again. Suppose when new entries will come in mysql database so that time Elasticsearch will fetch those records only.

Thank you.


(Magnus Bäck) #8

This is what the sql_last_value SQL parameter is used for. After each query execution Logstash records a column value from the last processed row and when the query runs the next time that value will be put in the sql_last_value parameter. Use that parameter in your query to only fetch values that are more recent. Obviously, the column you use for this purpose must be a "last modified" timestamp or something else that's ever increasing.

Again, this is explained (with examples) in the jdbc input documentation.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.