Pull new enteries from distributed Cassandra database


(Ayush Garg) #1

Hi all,

I'm trying to setup Logstash to continuously pull information from Cassandra, in the same fashion it works with file inputs. I'm using the Cassandra JDBC Driver74 and I can successfully pull the contents. But like sql I can not use last value parameter to pull latest value. So after running the query once I wanted to pull only new entries.
Here is my current config:

input {
jdbc {
jdbc_connection_string => "jdbc:cassandra://hostname:9160"
schedule => "* * * * *"
jdbc_user => "cassandra"
jdbc_password => "cassandra"
jdbc_driver_library => "$PATH/cassandra_driver.jar"
jdbc_driver_class => "com.dbschema.CassandraJdbcDriver"
statement => "SELECT * FROM table_decision"
}
}

output {
elasticsearch{
hosts => ["localhost"]
document_id => "%{index_pk}"
index => "logstash-2017-01-03"

}

stdout {
codec => json_lines
}
}

I am using ELK to view logs in real time that are published in Cassandra database. So I am able to view them but as the size of the tables goes to millions I face latency. docement_id helps me to avoid duplicate data but whenever I am deleting all the documents in that index, logstash again pull the entire table. So is there anyway to pull only new entries.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.