Cassandra + Elasticsearch or Just Elasticsearch for Primary data store


(pranav amin) #1

Hi,

I'm struggling to chose between these two options: with having
Elasticsearch as a primary data store or should I need Cassandra as the
primary data store and then data being copied in ES for indexing?

The goal is just to store documents worth of 144 KB and possibly increasing
to 512KB. The load will be in terms of 100 million a day for say. Every
field in document to be indexed so that it can be searched as soon as we
get it into the data store. Adhoc queries are a must on this data set. The
system must be scalable when the load goes to billion and data durability
and availability is a must.

I'm just confused if Cassandra can really make a difference here, since
looks to me ES can suffice here.

Anyone disagree or agreeing, views or concern welcome.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b88f54a-3efa-45e2-9fc2-66fcbd75cd6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Tim Uckun) #2

I'm just confused if Cassandra can really make a difference here, since
looks to me ES can suffice here.

If you are not going to be using Cassandra for indexing then there is no
reason to have it. If you want durability in case something goes wrong with
ES you can just store your data in a log file before pumping it into ES.
If for whatever reason something happens to your ES cluster you can
reconstruct it using the log files.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eae8e382-be88-443d-88af-8beb46ab64f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(pranav amin) #3

Thanks Tim.

Does that mean i can't get durability if i store my data in ES as a primary
data store?

Thanks
Pranav.

On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:

I'm just confused if Cassandra can really make a difference here, since
looks to me ES can suffice here.

If you are not going to be using Cassandra for indexing then there is no
reason to have it. If you want durability in case something goes wrong with
ES you can just store your data in a log file before pumping it into ES.
If for whatever reason something happens to your ES cluster you can
reconstruct it using the log files.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/17b22829-b482-4dcc-8b36-4575b176cb14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


#4

What do you mean by "durability"?

Its highly likely that elastic has the same storage guarantees that
cassandra does.
That said, some people like to have the flexibility of having the golden
source elsewhere and the ability to blow away the index & re-index at a
whim.
There are a number of elastic users, however, where this is not viable -
where reindexing their volume of data would take a week or 2.

How much data are you looking at storing/indexing? Mb? Gb? Tb? Pb?

-M

On Tuesday, 15 July 2014 15:27:15 UTC+1, pranav amin wrote:

Thanks Tim.

Does that mean i can't get durability if i store my data in ES as a
primary data store?

Thanks
Pranav.

On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:

I'm just confused if Cassandra can really make a difference here, since
looks to me ES can suffice here.

If you are not going to be using Cassandra for indexing then there is no
reason to have it. If you want durability in case something goes wrong with
ES you can just store your data in a log file before pumping it into ES.
If for whatever reason something happens to your ES cluster you can
reconstruct it using the log files.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6cf27509-09bf-4e08-834f-baf6665ad104%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(pranav amin) #5

Thanks.

We have 300 TB of data, average size of document stored is 512KB. Just want
to make sure that using ES as primary data store I'm not missing anything.
From your response it looks to me like durability isn't a concern with ES.

Thanks
Pranav.

On Wednesday, July 16, 2014 6:53:52 AM UTC-4, mooky wrote:

What do you mean by "durability"?

Its highly likely that elastic has the same storage guarantees that
cassandra does.
That said, some people like to have the flexibility of having the golden
source elsewhere and the ability to blow away the index & re-index at a
whim.
There are a number of elastic users, however, where this is not viable -
where reindexing their volume of data would take a week or 2.

How much data are you looking at storing/indexing? Mb? Gb? Tb? Pb?

-M

On Tuesday, 15 July 2014 15:27:15 UTC+1, pranav amin wrote:

Thanks Tim.

Does that mean i can't get durability if i store my data in ES as a
primary data store?

Thanks
Pranav.

On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:

I'm just confused if Cassandra can really make a difference here, since
looks to me ES can suffice here.

If you are not going to be using Cassandra for indexing then there is no
reason to have it. If you want durability in case something goes wrong with
ES you can just store your data in a log file before pumping it into ES.
If for whatever reason something happens to your ES cluster you can
reconstruct it using the log files.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fad94a5b-2378-4088-98f1-abb4366be230%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Otis Gospodnetić) #6

It doesn't sound like Cassandra adds any value. You could have asked the
same question, but substituting Cassandra with HBase or HDFS or MySQL, or
any other type of storage. But if your main goal is to search it, ES will
do just fine. You can always do snapshots to make backups, feed ES through
Kafka and rely on its TTL and ability to reindex recent data from Kafka if
you need to, etc.

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wednesday, July 16, 2014 10:25:54 AM UTC-4, pranav amin wrote:

Thanks.

We have 300 TB of data, average size of document stored is 512KB. Just
want to make sure that using ES as primary data store I'm not missing
anything.
From your response it looks to me like durability isn't a concern with ES.

Thanks
Pranav.

On Wednesday, July 16, 2014 6:53:52 AM UTC-4, mooky wrote:

What do you mean by "durability"?

Its highly likely that elastic has the same storage guarantees that
cassandra does.
That said, some people like to have the flexibility of having the golden
source elsewhere and the ability to blow away the index & re-index at a
whim.
There are a number of elastic users, however, where this is not viable -
where reindexing their volume of data would take a week or 2.

How much data are you looking at storing/indexing? Mb? Gb? Tb? Pb?

-M

On Tuesday, 15 July 2014 15:27:15 UTC+1, pranav amin wrote:

Thanks Tim.

Does that mean i can't get durability if i store my data in ES as a
primary data store?

Thanks
Pranav.

On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:

I'm just confused if Cassandra can really make a difference here,
since looks to me ES can suffice here.

If you are not going to be using Cassandra for indexing then there is
no reason to have it. If you want durability in case something goes wrong
with ES you can just store your data in a log file before pumping it into
ES. If for whatever reason something happens to your ES cluster you can
reconstruct it using the log files.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b6985f0-0351-419c-b9ef-789ba613f415%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7