Hello,
So the goal is to import existing mysql table which has about 2 million records in the ES index. However, ES index after a while has much more data.
I also try to generate unique sha1 fingerprint of each message and use it as document_id to avoid, duplicates.
However, even trough original mysql table has 2m records, new ES index would have much more after a while.
What could be the problem and how is it possible to fix it?
Here is my config
input {
jdbc {
jdbc_driver_library => "/app/bin/mysql-connector-java-5.1.37-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://testdatabase.xxxxxxxx.us-west-2.rds.amazonaws.com:3306/test"
jdbc_page_size => 25000
jdbc_paging_enabled => true
statement => "SELECT * FROM Table"
}
}
filter {
ruby {
code => "
require 'digest/sha1';
event['fingerprint'] = Digest::SHA1.hexdigest(event.to_json);
"
}
}
output {
elasticsearch {
hosts => ["host:80"]
index => "fcblive"
document_type => "action"
document_id => "%{fingerprint}"
}
}