Load cassandra data into elasticsearch


(soujanya) #1

Hi all,
My requirement is to load data from cassandra column family into elasticsearch.For that I have successfully install the plugins(mapper attachments and cassandra-river) required for cassandra integration into elasticsearch. And I tried to load data of cassandra into elasticsearch using curl as below
curl -XPUT 'localhost:9200/_river/prodinfo/_meta' -d '{
"type" : "cassandra",
"cassandra" : {
"cluster_name" : "test-cluster",
"keyspace" : "catalogks",
"column_family" : "info",
"batch_size" : 1000,
"hosts" : "host1:9161",
"username" : "username",
"password" : "password"
},
"index" : {
"index" : "prodinfo",
"type" : "product"
}
}'
The index is creating with the cassandra credentials which ever I have given in above,but there is no data and columns related to cassandra keyspace,column family. How should I acheive that. Any help would be greatly appreciated.

Thank you in advance.


(David Pilato) #2

Rivers have been removed in elasticsearch 2.0 so you should not use them.

I'd try to inject data in Cassandra and Elasticsearch at the same time in your application layer.


(soujanya) #3

Hi David,

Thank you for your prompt reply. But is there any other way to load cassandra data into elasticsearch. Please help me out in resolve this.


(David Pilato) #4

I don't know any.

I'd read the data from Cassandra using my application layer and send JSON docs to ES then.

May be you can try a JDBC Driver for Cassandra and use JDBC input plugin from Logstash or jdbc importer?


(soujanya) #5

Thank you for reply, I'll try using any driver for Cassandra.


(Alain Rastoul) #6

Hi Soujanya,

you could use the datastax driver to connect to cassandra, select data you want to import and push it into ES with an http client and bulk API.
The Cassandra datastax driver is available in java (maven package or link on datastax site) or c# (nuget.org package). the java driver is a native not jdbc but usage is quite similar.
Both java and c# work very well, just be careful to use the version in sync with your cassandra install to avoid problems, some system tables changes broke things between versions.

For ES, an http client of your choice (apache http components in java or standard dotnet HttpClient in c#) with bulk API.

A problem you may have is that Cassandra is a distributed key value store and have strong limitations on SELECt statements in where clause and with table / partitions scans.
If you know the cassandra data model, and can deal with it this is fine, if you don't and want to do a basic 'SELECT * -or some columns-' without WHERE, I suggest you have a look at:
http://www.myhowto.org/bigdata/2013/11/04/scanning-the-entire-cassandra-column-family-with-cql/
to give you hints about paging the SELECT to deal with cluster partitions.

HTH

Regards,

Alain


(Chiranjith Rai) #7

Hi David,
What is the reason for removing the feature river from elasticsearch?
any issues with it?
any alternate solutions for the same operation?


(David Pilato) #8

You probably want tor read


(Chiranjith Rai) #9

Thnak you Mr. David


(Alberto São Marcos) #10

I'm also looking into an solid option to index C* into ES.
any news on the subject?


(Sumit Gupta) #11

@soujanya Did you tried jdbc importer?


(system) #12