Using ES as our primary and only datastore

Spring_Ninja · June 23, 2011, 3:55pm

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy

dadoonet · June 23, 2011, 4:41pm

Hi Remy,

I'm not answering but I'm asking a question about your case.
How do you sync MySql and ES ?

thanks
David

Le 23 juin 2011 à 17:55, Spring Ninja remy.gendron@ingeno.ca a écrit :

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy

ppearcy · June 23, 2011, 7:23pm

Hey,
This discussion should answer most of your questions regarding ES as
the primary data store:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/f8ff072a7039292d/f90501e0861b4e99#f90501e0861b4e99

Best Regards,
Paul

On Jun 23, 10:41 am, David Pilato da...@pilato.fr wrote:

Hi Remy,

I'm not answering but I'm asking a question about your case.
How do you sync MySql and ES ?

thanks
David

Le 23 juin 2011 à 17:55, Spring Ninja remy.gend...@ingeno.ca a écrit :

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy

Remy_Gendron · June 24, 2011, 3:54am

Well, just hooking into the lifecycle events from our service layer
and replicating the cruds to ES. It's a lot like Hibernate Search for
those who know.

dadoonet · June 24, 2011, 4:45am

Thanks.
That's the way I did it also.

At first, I was thinking of not modifying the service layer. So I tried to use hibernate listeners but it didn't work really fine because when you update a child entity only, even if you ask to hibernate to merge from the parent entity, the listener is only called for the child entity. I didn't find an easy way to do that.

Thanks
David

Le 24 juin 2011 à 05:54, Remy Gendron remy@arrova.ca a écrit :

Well, just hooking into the lifecycle events from our service layer
and replicating the cruds to ES. It's a lot like Hibernate Search for
those who know.

James_Cook · June 27, 2011, 5:19pm

Some of the drawbacks of using ES as a datastore are:

Lack of transactional support (as you mention). We use Hazelcast as a
memcached layer between our application and ES. Hazelcast supports
distributed transactions. While not supporting commit and rollback upon a
write failure, it does give us the ability to commit multiple writes as a
single unit of work.
No snapshotting of data. I miss this greatly for peace of mind. You
have to code your own solutions to be able to recover from corruption of the
Lucene indexes, as well as when ES introduces a change that requires a
reindexing of all the data. A shared gateway can help, but you will have to
pause indexing/flushing while making a backup of the repository.
Near real time behavior is hard for most who come from DB background.
You can't insert a record then query for that record without issuing a
refresh call or delaying long enough to ensure the record has been indexed.
No SQL support. It goes without saying, but this has an impact in such
there are no tools which allow you to manipulate data once it is in the
repository. Perhaps these will come in time.

Those are the big ones that I have noticed.

*Jim Cook
*
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Thu, Jun 23, 2011 at 11:55 AM, Spring Ninja remy.gendron@ingeno.cawrote:

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy

James_Cook · June 27, 2011, 6:01pm

I suppose #3 below is mitigated by this new feature:

github.com/elastic/elasticsearch

Realtime GET

opened 06:38AM - 24 Jun 11 UTC

closed 06:39AM - 24 Jun 11 UTC

kimchy

>feature v0.17.0

Realtime GET support allows to get a document once indexed regardless of the "re…fresh rate" of the index. It is enabled by default. In order to disable realtime GET, one can either set `realtime` parameter to `false`, or globally default it to by setting the `action.get.realtime` to `false` in the node configuration. # Realtime GET and stored fields When getting a document, one can specify `fields` to fetch from it. They will, when possible, be fetched as stored fields (fields mapped as stored in the mapping). When using realtime GET, there is no notion of stored fields (at least for a period of time, basically, until the next flush), so they will be extracted from the source itself (note, even if source is not enabled). It is a good practice to assume that the fields will be loaded from source when using realtime GET, even if the fields are stored.

-- jim

On Mon, Jun 27, 2011 at 1:19 PM, James Cook jcook@tracermedia.com wrote:

Some of the drawbacks of using ES as a datastore are:

Lack of transactional support (as you mention). We use Hazelcast as
a memcached layer between our application and ES. Hazelcast supports
distributed transactions. While not supporting commit and rollback upon a
write failure, it does give us the ability to commit multiple writes as a
single unit of work.

No snapshotting of data. I miss this greatly for peace of mind. You
have to code your own solutions to be able to recover from corruption of the
Lucene indexes, as well as when ES introduces a change that requires a
reindexing of all the data. A shared gateway can help, but you will have to
pause indexing/flushing while making a backup of the repository.

Near real time behavior is hard for most who come from DB
background. You can't insert a record then query for that record without
issuing a refresh call or delaying long enough to ensure the record has been
indexed.

No SQL support. It goes without saying, but this has an impact in
such there are no tools which allow you to manipulate data once it is in the
repository. Perhaps these will come in time.

Those are the big ones that I have noticed.

*Jim Cook
*
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Thu, Jun 23, 2011 at 11:55 AM, Spring Ninja remy.gendron@ingeno.cawrote:

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy

Remy_Gendron · June 28, 2011, 2:28am

Great summary James, thanks!

Shay, is this something that is considered on the long term roadmap,
things such as index migration when a new release comes out,
transactions, full restore, etc? I guess that if Lucene isn't meant to
be a datastore, it would be hard for ES to provide this...

On Jun 27, 2:01 pm, James Cook jc...@tracermedia.com wrote:

I suppose #3 below is mitigated by this new feature:Realtime GET · Issue #1060 · elastic/elasticsearch · GitHub

-- jim

On Mon, Jun 27, 2011 at 1:19 PM, James Cook jc...@tracermedia.com wrote:

Some of the drawbacks of using ES as a datastore are:

Lack of transactional support (as you mention). We use Hazelcast as
a memcached layer between our application and ES. Hazelcast supports
distributed transactions. While not supporting commit and rollback upon a
write failure, it does give us the ability to commit multiple writes as a
single unit of work.

No snapshotting of data. I miss this greatly for peace of mind. You
have to code your own solutions to be able to recover from corruption of the
Lucene indexes, as well as when ES introduces a change that requires a
reindexing of all the data. A shared gateway can help, but you will have to
pause indexing/flushing while making a backup of the repository.

Near real time behavior is hard for most who come from DB
background. You can't insert a record then query for that record without
issuing a refresh call or delaying long enough to ensure the record has been
indexed.

No SQL support. It goes without saying, but this has an impact in
such there are no tools which allow you to manipulate data once it is in the
repository. Perhaps these will come in time.

Those are the big ones that I have noticed.

*Jim Cook
*
jc...@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Thu, Jun 23, 2011 at 11:55 AM, Spring Ninja remy.gend...@ingeno.cawrote:

I have read through some threads where people mention that their only
datastore is ES. From what I have read it can be an option.

However, can you point out some drawbacks in doing so? My main no-go
is transaction support. I have another project using Google App Engine
and have an app that does not use transactions. It can be done but
requires more thinking.

I am currently syncing ES with MySQL, including transaction support.
However, I would love to let go of MySQL and not have to worry about
scaling MySQL.

Thanks!

Remy

Topic		Replies	Views
ES as primary database Elasticsearch	2	729	May 13, 2018
Anyone using Elasticsearch as their primary datastore Elasticsearch	2	1527	May 6, 2017
ES + Hadoop = primary datastore? Elasticsearch es-hadoop	5	1284	July 6, 2017
Using ES as a primary datastore Elasticsearch	7	1208	July 6, 2017
Using ElasticSearch as Primary Data Store Elasticsearch	8	1787	July 6, 2017

Using ES as our primary and only datastore

Related topics