Hi all,
I'm trying to setup Logstash to continuously pull information from Cassandra, in the same fashion it works with file inputs. I'm using the Cassandra JDBC Driver74 and I can successfully pull the contents. But like sql I can not use last value parameter to pull latest value. So after running the query once I wanted to pull only new entries.
Here is my current config:
input {
jdbc {
jdbc_connection_string => "jdbc:cassandra://hostname:9160"
schedule => "* * * * *"
jdbc_user => "cassandra"
jdbc_password => "cassandra"
jdbc_driver_library => "$PATH/cassandra_driver.jar"
jdbc_driver_class => "com.dbschema.CassandraJdbcDriver"
statement => "SELECT * FROM table_decision"
}
}
output {
elasticsearch{
hosts => ["localhost"]
document_id => "%{index_pk}"
index => "logstash-2017-01-03"
}
stdout {
codec => json_lines
}
}
I am using ELK to view logs in real time that are published in Cassandra database. So I am able to view them but as the size of the tables goes to millions I face latency. docement_id helps me to avoid duplicate data but whenever I am deleting all the documents in that index, logstash again pull the entire table. So is there anyway to pull only new entries.