Using ES as a primary datastore

Hello,

We are planning to use ES as a primary datastore.

Here is my usecase

We receive a million transactions per day (all are inserts).
Each transaction is around 500KB size, transaction has 10 fields we should
be able to search on all 10 fields.
We want to keep around 1 yr worth of data, this comes around 180TB

Can you please let me know any problems that might arise if i use elastic
search as the primary datastore.

Regards,
Suman

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f52b61e2-0955-4e79-8bb8-61c9428c67d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

That's a lot of data, do you have a big budget, automation, monitoring?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 17 September 2014 20:41, P Suman papanaboina.suman@gmail.com wrote:

Hello,

We are planning to use ES as a primary datastore.

Here is my usecase

We receive a million transactions per day (all are inserts).
Each transaction is around 500KB size, transaction has 10 fields we should
be able to search on all 10 fields.
We want to keep around 1 yr worth of data, this comes around 180TB

Can you please let me know any problems that might arise if i use elastic
search as the primary datastore.

Regards,
Suman

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f52b61e2-0955-4e79-8bb8-61c9428c67d1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f52b61e2-0955-4e79-8bb8-61c9428c67d1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YHED333JiG8Jb8X8h41HF64xGze-ZoJKrt2R3fxxqn_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

You have to calculate the volumes you will keep in one shard first then you
have to break your volumes into the number of shards you will maintain and
then scale accordingly into a number of nodes, or at least as your volumes
grow you should grow your cluster as well.

It is difficult to predict what problems may arise it is too generic your
case, what will be the usage of the cluster? what queries you will perform,
you will mostly do indexing and occasionally querying or you will
intensively query your data.

Most important you need to think how you will partition your data, will
you have one index, multiple index like a logstash approach? or not
Maybe check here: Sizing Elasticsearch | Elastic Blog

For data more than a year what you will do delete them? Do you afford to
lose data? Will you keep backups?

IMHO, these are some of the questions you must answer in order to see
whether such an approach suit your needs. It is hardware, structure and
partitioning of your data.

Thomas

On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote:

Hello,

We are planning to use ES as a primary datastore.

Here is my usecase

We receive a million transactions per day (all are inserts).
Each transaction is around 500KB size, transaction has 10 fields we should
be able to search on all 10 fields.
We want to keep around 1 yr worth of data, this comes around 180TB

Can you please let me know any problems that might arise if i use elastic
search as the primary datastore.

Regards,
Suman

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES is a fantastic search engine but there is some risk
http://aphyr.com/posts/317-call-me-maybe-elasticsearch of data loss,
and a few
other
https://www.quora.com/Why-should-I-NOT-use-ElasticSearch-as-my-primary-datastore
potential disadvantages which might or might not be relevant to you. You
can always combine ES via JDBC river
https://github.com/jprante/elasticsearch-river-jdbc with a stable, secure
database, e.g. Mysql
https://www.quora.com/How-do-i-use-Elastic-search-with-mysql-database-I-am-currently-experimenting-with-jdbc-river-but-will-it-be-fast-enough-in-productionor
Hbase http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html,
since you have lots of data hbase might be a better option.

On Wed, Sep 17, 2014 at 8:04 AM, Thomas thomas.bolis@gmail.com wrote:

Hi,

You have to calculate the volumes you will keep in one shard first then
you have to break your volumes into the number of shards you will maintain
and then scale accordingly into a number of nodes, or at least as your
volumes grow you should grow your cluster as well.

It is difficult to predict what problems may arise it is too generic your
case, what will be the usage of the cluster? what queries you will perform,
you will mostly do indexing and occasionally querying or you will
intensively query your data.

Most important you need to think how you will partition your data, will
you have one index, multiple index like a logstash approach? or not
Maybe check here: Sizing Elasticsearch | Elastic Blog

For data more than a year what you will do delete them? Do you afford to
lose data? Will you keep backups?

IMHO, these are some of the questions you must answer in order to see
whether such an approach suit your needs. It is hardware, structure and
partitioning of your data.

Thomas

On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote:

Hello,

We are planning to use ES as a primary datastore.

Here is my usecase

We receive a million transactions per day (all are inserts).
Each transaction is around 500KB size, transaction has 10 fields we
should be able to search on all 10 fields.
We want to keep around 1 yr worth of data, this comes around 180TB

Can you please let me know any problems that might arise if i use elastic
search as the primary datastore.

Regards,
Suman

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOtKWX47iRi6P%2BSp-GC%2B8JL1xmwKoL4yHerMC4PG5rYDiL8YXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I'd also suggest checking out DataStax Enterprise -- a commercial flavor
of Cassandra. Its Cassandra, so update rates and volume are its strong
suit. Its intended as a primary data store. It has a Solr (another search
engine) instance on each node that indexes the local data on that node,
enabling full text search. Solr is not nearly as user friendly as
Elasticsearch, but otherwise there's a lot of comparable features depending
on your search needs.

Doug

From: Alex Kamil alex.kamil@gmail.com
Sent: ‎9/‎17/‎2014 8:48 AM
To: elasticsearch@googlegroups.com
Subject: Re: Using ES as a primary datastore.

ES is a fantastic search engine but there is some risk
http://aphyr.com/posts/317-call-me-maybe-elasticsearch of data loss,
and a few
other
https://www.quora.com/Why-should-I-NOT-use-ElasticSearch-as-my-primary-datastore
potential disadvantages which might or might not be relevant to you. You
can always combine ES via JDBC river
https://github.com/jprante/elasticsearch-river-jdbc with a stable, secure
database, e.g. Mysql
https://www.quora.com/How-do-i-use-Elastic-search-with-mysql-database-I-am-currently-experimenting-with-jdbc-river-but-will-it-be-fast-enough-in-productionor
Hbase http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html,
since you have lots of data hbase might be a better option.

On Wed, Sep 17, 2014 at 8:04 AM, Thomas thomas.bolis@gmail.com wrote:

Hi,

You have to calculate the volumes you will keep in one shard first then
you have to break your volumes into the number of shards you will maintain
and then scale accordingly into a number of nodes, or at least as your
volumes grow you should grow your cluster as well.

It is difficult to predict what problems may arise it is too generic your
case, what will be the usage of the cluster? what queries you will perform,
you will mostly do indexing and occasionally querying or you will
intensively query your data.

Most important you need to think how you will partition your data, will
you have one index, multiple index like a logstash approach? or not
Maybe check here: Sizing Elasticsearch | Elastic Blog

For data more than a year what you will do delete them? Do you afford to
lose data? Will you keep backups?

IMHO, these are some of the questions you must answer in order to see
whether such an approach suit your needs. It is hardware, structure and
partitioning of your data.

Thomas

On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote:

Hello,

We are planning to use ES as a primary datastore.

Here is my usecase

We receive a million transactions per day (all are inserts).
Each transaction is around 500KB size, transaction has 10 fields we
should be able to search on all 10 fields.
We want to keep around 1 yr worth of data, this comes around 180TB

Can you please let me know any problems that might arise if i use elastic
search as the primary datastore.

Regards,
Suman

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX47iRi6P%2BSp-GC%2B8JL1xmwKoL4yHerMC4PG5rYDiL8YXA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX47iRi6P%2BSp-GC%2B8JL1xmwKoL4yHerMC4PG5rYDiL8YXA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5188131938126812645%40unknownmsgid.
For more options, visit https://groups.google.com/d/optout.

Like others have mentioned, there is a risk of data loss at the moment but
Elasticsearch is working on making it better and better.

I watched this video about Couchbase and
Elasticsearch, https://www.youtube.com/watch?v=rpwtxpmuDb0

After that then I highly recommend researching Couchbase as an option.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bff701c2-7885-4e8b-8ddd-968172ab4492%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

in the spirit of this thread let me plug my favorite database here: Connecting
Hbase to Elasticsearch in 10 min or less
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html

On Mon, Nov 24, 2014 at 2:25 PM, Elvar Böðvarsson elvarb@gmail.com wrote:

Like others have mentioned, there is a risk of data loss at the moment but
Elasticsearch is working on making it better and better.

I watched this video about Couchbase and Elasticsearch,
https://www.youtube.com/watch?v=rpwtxpmuDb0

After that then I highly recommend researching Couchbase as an option.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bff701c2-7885-4e8b-8ddd-968172ab4492%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/bff701c2-7885-4e8b-8ddd-968172ab4492%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOtKWX6%3DiZ2pd7NEzYVM2vXhVJ1KpeoQy5oYh-qbVC6G6KTVEw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.