ES as a primary database

Hi,

I've been speaking recently with various persons about using Es as a
primary database. Here are the main blockers I heard of :

  • problems with data corruption. Is it still real for 1.4 version ?

  • data not being available immediately (because of indexing, which is
    normal)

Do you have further insights about this topic ?

Cheers,
Yann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/98c67f8f-a925-4723-8a71-6bdc764e4972%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You should definitely consider using ES as a primary data source, but as
with any database, make sure to:

  • Replicate data across your cluster
  • Take daily snapshots and store them on another machine / data center.
  • Monitor your cluster

Regarding the point about data not being available immediately, it's not
quite true. You can control the write consistency so that you're sure that
the data is actually persisted on the number of nodes you want and
available thorugh the Get API. The indexing happens asynchronously, but the
data is there immediately.

Elasticsearch has (previously) had issues with index corruption, especially
caused by OOM errors and split brain. You should make sure to set
minimum_master_nodes to a reasonable value to avoid split-brain, and you
should use the latest ES version where a circuit breaker has been
introduced to avoid OOM errors.

Also read the blog posts from people who advice against using ES as a
primary data source, such as this guy, to make a better
decision: http://igor.kupczynski.info/2014/06/26/elastic-cap.html

Lasse

On Monday, November 17, 2014 1:32:14 PM UTC+1, Yann Barraud wrote:

Hi,

I've been speaking recently with various persons about using Es as a
primary database. Here are the main blockers I heard of :

  • problems with data corruption. Is it still real for 1.4 version ?

  • data not being available immediately (because of indexing, which is
    normal)

Do you have further insights about this topic ?

Cheers,
Yann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/390def63-33e9-4b6c-ad34-c6943f59d012%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Yann,
I would advise against using Elasticsearch or SolrCloud as the primary
data repository for not just the reasons you give. However, there are
simple, replicated data repositories out there (e.g., Apache Cassandra)
that fit nicely with replication requirements and can index documents
fed into the search platform based on some id. Think of the search
platform as a pretty fast read cache and retrieval engine for your
actual database. If you combine this approach with not storing the
entire data in the search core, but only select meta data you need for
retrieval, this becomes also quite storage-efficient.

Search is made for retrieval. Databases are made for persistence of data.

Best regards,
--Jürgen

On Monday, November 17, 2014 1:32:14 PM UTC+1, Yann Barraud wrote:

Hi,

I've been speaking recently with various persons about using Es as a
primary database. Here are the main blockers I heard of :

  * problems with data corruption. Is it still real for 1.4 version ?

  * data not being available immediately (because of indexing, which
    is normal)


Do you have further insights about this topic ?

Cheers,
Yann

--

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
i.A. Jürgen Wagner
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wagner@devoteam.com
mailto:juergen.wagner@devoteam.com, URL: www.devoteam.de
http://www.devoteam.de/


Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/546B3CA5.9030700%40devoteam.com.
For more options, visit https://groups.google.com/d/optout.