Using ES as our primary and only datastore


(Spring Ninja) #1

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy


(David Pilato) #2

Hi Remy,

I'm not answering but I'm asking a question about your case.
How do you sync MySql and ES ?

thanks
David :wink:

Le 23 juin 2011 à 17:55, Spring Ninja remy.gendron@ingeno.ca a écrit :

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy


(ppearcy) #3

Hey,
This discussion should answer most of your questions regarding ES as
the primary data store:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/f8ff072a7039292d/f90501e0861b4e99#f90501e0861b4e99

Best Regards,
Paul

On Jun 23, 10:41 am, David Pilato da...@pilato.fr wrote:

Hi Remy,

I'm not answering but I'm asking a question about your case.
How do you sync MySql and ES ?

thanks
David :wink:

Le 23 juin 2011 à 17:55, Spring Ninja remy.gend...@ingeno.ca a écrit :

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy


(Remy Gendron) #4

Well, just hooking into the lifecycle events from our service layer
and replicating the cruds to ES. It's a lot like Hibernate Search for
those who know.


(David Pilato) #5

Thanks.
That's the way I did it also.

At first, I was thinking of not modifying the service layer. So I tried to use hibernate listeners but it didn't work really fine because when you update a child entity only, even if you ask to hibernate to merge from the parent entity, the listener is only called for the child entity. I didn't find an easy way to do that.

Thanks
David :wink:

Le 24 juin 2011 à 05:54, Remy Gendron remy@arrova.ca a écrit :

Well, just hooking into the lifecycle events from our service layer
and replicating the cruds to ES. It's a lot like Hibernate Search for
those who know.


(James Cook) #6

Some of the drawbacks of using ES as a datastore are:

  1. Lack of transactional support (as you mention). We use Hazelcast as a
    memcached layer between our application and ES. Hazelcast supports
    distributed transactions. While not supporting commit and rollback upon a
    write failure, it does give us the ability to commit multiple writes as a
    single unit of work.
  2. No snapshotting of data. I miss this greatly for peace of mind. You
    have to code your own solutions to be able to recover from corruption of the
    Lucene indexes, as well as when ES introduces a change that requires a
    reindexing of all the data. A shared gateway can help, but you will have to
    pause indexing/flushing while making a backup of the repository.
  3. Near real time behavior is hard for most who come from DB background.
    You can't insert a record then query for that record without issuing a
    refresh call or delaying long enough to ensure the record has been indexed.
  4. No SQL support. It goes without saying, but this has an impact in such
    there are no tools which allow you to manipulate data once it is in the
    repository. Perhaps these will come in time.

Those are the big ones that I have noticed.

*Jim Cook
*
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Thu, Jun 23, 2011 at 11:55 AM, Spring Ninja remy.gendron@ingeno.cawrote:

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy


(James Cook) #7

I suppose #3 below is mitigated by this new feature:

-- jim

On Mon, Jun 27, 2011 at 1:19 PM, James Cook jcook@tracermedia.com wrote:

Some of the drawbacks of using ES as a datastore are:

  1. Lack of transactional support (as you mention). We use Hazelcast as
    a memcached layer between our application and ES. Hazelcast supports
    distributed transactions. While not supporting commit and rollback upon a
    write failure, it does give us the ability to commit multiple writes as a
    single unit of work.
  2. No snapshotting of data. I miss this greatly for peace of mind. You
    have to code your own solutions to be able to recover from corruption of the
    Lucene indexes, as well as when ES introduces a change that requires a
    reindexing of all the data. A shared gateway can help, but you will have to
    pause indexing/flushing while making a backup of the repository.
  3. Near real time behavior is hard for most who come from DB
    background. You can't insert a record then query for that record without
    issuing a refresh call or delaying long enough to ensure the record has been
    indexed.
  4. No SQL support. It goes without saying, but this has an impact in
    such there are no tools which allow you to manipulate data once it is in the
    repository. Perhaps these will come in time.

Those are the big ones that I have noticed.

*Jim Cook
*
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Thu, Jun 23, 2011 at 11:55 AM, Spring Ninja remy.gendron@ingeno.cawrote:

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy


(Remy Gendron) #8

Great summary James, thanks!

Shay, is this something that is considered on the long term roadmap,
things such as index migration when a new release comes out,
transactions, full restore, etc? I guess that if Lucene isn't meant to
be a datastore, it would be hard for ES to provide this...

On Jun 27, 2:01 pm, James Cook jc...@tracermedia.com wrote:

I suppose #3 below is mitigated by this new feature:https://github.com/elasticsearch/elasticsearch/issues/1060

-- jim

On Mon, Jun 27, 2011 at 1:19 PM, James Cook jc...@tracermedia.com wrote:

Some of the drawbacks of using ES as a datastore are:

  1. Lack of transactional support (as you mention). We use Hazelcast as
    a memcached layer between our application and ES. Hazelcast supports
    distributed transactions. While not supporting commit and rollback upon a
    write failure, it does give us the ability to commit multiple writes as a
    single unit of work.
  2. No snapshotting of data. I miss this greatly for peace of mind. You
    have to code your own solutions to be able to recover from corruption of the
    Lucene indexes, as well as when ES introduces a change that requires a
    reindexing of all the data. A shared gateway can help, but you will have to
    pause indexing/flushing while making a backup of the repository.
  3. Near real time behavior is hard for most who come from DB
    background. You can't insert a record then query for that record without
    issuing a refresh call or delaying long enough to ensure the record has been
    indexed.
  4. No SQL support. It goes without saying, but this has an impact in
    such there are no tools which allow you to manipulate data once it is in the
    repository. Perhaps these will come in time.

Those are the big ones that I have noticed.

*Jim Cook
*
jc...@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Thu, Jun 23, 2011 at 11:55 AM, Spring Ninja remy.gend...@ingeno.cawrote:

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy


(system) #9