Hi all,
My requirement is to load data from cassandra column family into elasticsearch.For that I have successfully install the plugins(mapper attachments and cassandra-river) required for cassandra integration into elasticsearch. And I tried to load data of cassandra into elasticsearch using curl as below
curl -XPUT 'localhost:9200/_river/prodinfo/_meta' -d '{
"type" : "cassandra",
"cassandra" : {
"cluster_name" : "test-cluster",
"keyspace" : "catalogks",
"column_family" : "info",
"batch_size" : 1000,
"hosts" : "host1:9161",
"username" : "username",
"password" : "password"
},
"index" : {
"index" : "prodinfo",
"type" : "product"
}
}'
The index is creating with the cassandra credentials which ever I have given in above,but there is no data and columns related to cassandra keyspace,column family. How should I acheive that. Any help would be greatly appreciated.
you could use the datastax driver to connect to cassandra, select data you want to import and push it into ES with an http client and bulk API.
The Cassandra datastax driver is available in java (maven package or link on datastax site) or c# (nuget.org package). the java driver is a native not jdbc but usage is quite similar.
Both java and c# work very well, just be careful to use the version in sync with your cassandra install to avoid problems, some system tables changes broke things between versions.
For ES, an http client of your choice (apache http components in java or standard dotnet HttpClient in c#) with bulk API.
A problem you may have is that Cassandra is a distributed key value store and have strong limitations on SELECt statements in where clause and with table / partitions scans.
If you know the cassandra data model, and can deal with it this is fine, if you don't and want to do a basic 'SELECT * -or some columns-' without WHERE, I suggest you have a look at: http://www.myhowto.org/bigdata/2013/11/04/scanning-the-entire-cassandra-column-family-with-cql/
to give you hints about paging the SELECT to deal with cluster partitions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.