ElasticSearch as a database


(Ville) #1

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I've heard some concerns that Lucene based indexes are not reliable
and that some documents might be missed. Is this still a valid
concern? E.g. if I were to store items like this
{..."done":"true"/"false"...}, is it possible that when searching for
documents where done=false some documents are never received by the
client?


(Karussell) #2

Lucene is reliable we never ever had those problems. Neither in pure
lucene, solr or elasticsearch.

That said it is easily possible to 'overwrite' an existing document,
but with the versioning feature of ElasticSearch you can prevent this.

PS: I'm actually using ES as my NoSql store for jetwick.com :wink:

On May 16, 6:06 pm, Ville villematti.toivo...@gmail.com wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I've heard some concerns that Lucene based indexes are not reliable
and that some documents might be missed. Is this still a valid
concern? E.g. if I were to store items like this
{..."done":"true"/"false"...}, is it possible that when searching for
documents where done=false some documents are never received by the
client?


(timscott) #3

Your example of retrieving documents based on a boolean is supported
in ES by filters. My app depends on using filters to narrow searches,
and I have not had any trouble (although I'm only in alpha).

If you only ever want filter (and not search indexed data), you'd
probably be better off with one of the many other NoSql alternatives,
IMO.

On May 16, 11:06 am, Ville villematti.toivo...@gmail.com wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I've heard some concerns that Lucene based indexes are not reliable
and that some documents might be missed. Is this still a valid
concern? E.g. if I were to store items like this
{..."done":"true"/"false"...}, is it possible that when searching for
documents where done=false some documents are never received by the
client?


(Shay Banon) #4

Only filtering on boolean queries is supported by constant_score, or you can also use things like term query to query it, not necessarily a filter.
On Monday, May 16, 2011 at 10:31 PM, Tim Scott wrote:
Your example of retrieving documents based on a boolean is supported

in ES by filters. My app depends on using filters to narrow searches,
and I have not had any trouble (although I'm only in alpha).

If you only ever want filter (and not search indexed data), you'd
probably be better off with one of the many other NoSql alternatives,
IMO.

On May 16, 11:06 am, Ville villematti.toivo...@gmail.com wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I've heard some concerns that Lucene based indexes are not reliable
and that some documents might be missed. Is this still a valid
concern? E.g. if I were to store items like this
{..."done":"true"/"false"...}, is it possible that when searching for
documents where done=false some documents are never received by the
client?


(Ville) #5

Thank you for the quick replies. I'm going to do a lot of other kind
of queries also so I would like to use ES.

However, the main point in my original post was the reliability of
Lucene/ES itself. My collegues have expressed concerns that Lucene
does not always "find" what it should. To me this would seem like a
bug but I couldn't find any bug reports with google related to these
concerns.

Based on your responses above, it seems like that you consider ES
reliable in terms of correctness. My concern could be summarized as
"Is ES less reliable for indexing than e.g. native MySQL indexes?". I
guess not?

On 16 touko, 22:51, Shay Banon shay.ba...@elasticsearch.com wrote:

Only filtering on boolean queries is supported by constant_score, or you can also use things like term query to query it, not necessarily a filter.On Monday, May 16, 2011 at 10:31 PM, Tim Scott wrote:

Your example of retrieving documents based on a boolean is supported

in ES by filters. My app depends on using filters to narrow searches,
and I have not had any trouble (although I'm only in alpha).

If you only ever want filter (and not search indexed data), you'd
probably be better off with one of the many other NoSql alternatives,
IMO.

On May 16, 11:06 am, Ville villematti.toivo...@gmail.com wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I've heard some concerns that Lucene based indexes are not reliable
and that some documents might be missed. Is this still a valid
concern? E.g. if I were to store items like this
{..."done":"true"/"false"...}, is it possible that when searching for
documents where done=false some documents are never received by the
client?


(Shay Banon) #6

I am not familiar with any bugs in Lucene search. It might be that your colleague got confused as to when and how to refresh Lucene in order to make documents visible for searching.
On Tuesday, May 17, 2011 at 12:25 AM, Ville wrote:

Thank you for the quick replies. I'm going to do a lot of other kind
of queries also so I would like to use ES.

However, the main point in my original post was the reliability of
Lucene/ES itself. My collegues have expressed concerns that Lucene
does not always "find" what it should. To me this would seem like a
bug but I couldn't find any bug reports with google related to these
concerns.

Based on your responses above, it seems like that you consider ES
reliable in terms of correctness. My concern could be summarized as
"Is ES less reliable for indexing than e.g. native MySQL indexes?". I
guess not?

On 16 touko, 22:51, Shay Banon shay.ba...@elasticsearch.com wrote:

Only filtering on boolean queries is supported by constant_score, or you can also use things like term query to query it, not necessarily a filter.On Monday, May 16, 2011 at 10:31 PM, Tim Scott wrote:

Your example of retrieving documents based on a boolean is supported

in ES by filters. My app depends on using filters to narrow searches,
and I have not had any trouble (although I'm only in alpha).

If you only ever want filter (and not search indexed data), you'd
probably be better off with one of the many other NoSql alternatives,
IMO.

On May 16, 11:06 am, Ville villematti.toivo...@gmail.com wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I've heard some concerns that Lucene based indexes are not reliable
and that some documents might be missed. Is this still a valid
concern? E.g. if I were to store items like this
{..."done":"true"/"false"...}, is it possible that when searching for
documents where done=false some documents are never received by the
client?


(Berkay Mollamustafaoglu-2) #7

We've been using lucene based data store (even before ES with Compass) for
many years in production and never had that type of problem. I don't see any
reason why lucene index would be less reliable. In fact, I'd say it's a lot
more reliable than most of the NoSql solutions out there. You can take the
data out and browse outside to see what's in the index, etc.

This type of confusion may be due to ES not updating the index right away
(though it is written and safe). This is a key difference ES has and can
have impact when you're using it as a data store rather than a search
engine. When you write to an index, it's not immediately visible via search
till the index is refreshed, which is executed once a sec by default.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, May 16, 2011 at 5:25 PM, Ville villematti.toivonen@gmail.comwrote:

Thank you for the quick replies. I'm going to do a lot of other kind
of queries also so I would like to use ES.

However, the main point in my original post was the reliability of
Lucene/ES itself. My collegues have expressed concerns that Lucene
does not always "find" what it should. To me this would seem like a
bug but I couldn't find any bug reports with google related to these
concerns.

Based on your responses above, it seems like that you consider ES
reliable in terms of correctness. My concern could be summarized as
"Is ES less reliable for indexing than e.g. native MySQL indexes?". I
guess not?

On 16 touko, 22:51, Shay Banon shay.ba...@elasticsearch.com wrote:

Only filtering on boolean queries is supported by constant_score, or you
can also use things like term query to query it, not necessarily a
filter.On Monday, May 16, 2011 at 10:31 PM, Tim Scott wrote:

Your example of retrieving documents based on a boolean is supported

in ES by filters. My app depends on using filters to narrow searches,
and I have not had any trouble (although I'm only in alpha).

If you only ever want filter (and not search indexed data), you'd
probably be better off with one of the many other NoSql alternatives,
IMO.

On May 16, 11:06 am, Ville villematti.toivo...@gmail.com wrote:

I'm planning to use ES as a storage for JSON documents. The idea is
to

reference each document with index/type/id and perform search queries
when needed.

I've heard some concerns that Lucene based indexes are not reliable
and that some documents might be missed. Is this still a valid
concern? E.g. if I were to store items like this
{..."done":"true"/"false"...}, is it possible that when searching for
documents where done=false some documents are never received by the
client?


(dpilato) #8
I'm planning to use ES as a storage for JSON documents. The idea is to reference each document with index/type/id and perform search queries when needed.

I was thinking about the same idea few weeks ago because ES is really easy to setup and is powerful in managing automatically shards on many instances.

Then I tried to use CouchDB as the primary document storage and activate a river to ES.
Really easy to setup. Really fast also to push documents into CouchDB.
CouchDB allows also to add attachements (such as PDF, XML, oOo, ...) to JSON documents. I don't know at this time how ES can handle (or not) PDF files...

It could be a nice design for your own needs.
So is there any reason for not using CouchDB as a storage for JSON docs ?

Cheers,
David.


(Ville) #9

I want something fast, masterless and fault-tolerant with search. I've
been considering Riak (nice but Search in beta), Cassandra (bloated?,
tedious indexing) and ES.

CouchDB could basically work for me but I've noticed that there are
people not happy with its stability, see
http://labs.linkfluence.net/nosql/2011/03/07/moving_from_couchdb_to_riak.html

I'm only going to store JSON. Do you think CouchDB+ES would be better
approach instead of just ES? I don't know CouchDB that well so what
would be the advantages of using it also?

On May 17, 10:31 am, dpilato david.pil...@douane.finances.gouv.fr
wrote:

Ville wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I was thinking about the same idea few weeks ago because ES is really easy
to setup and is powerful in managing automatically shards on many instances.

Then I tried to use CouchDB as the primary document storage and activate a
river to ES.
Really easy to setup. Really fast also to push documents into CouchDB.
CouchDB allows also to add attachements (such as PDF, XML, oOo, ...) to JSON
documents. I don't know at this time how ES can handle (or not) PDF files...

It could be a nice design for your own needs.
So is there any reason for not using CouchDB as a storage for JSON docs ?

Cheers,
David.

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-as-a-da...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(James Cook) #10

FWIW, we have been using ES as our primary datastore on two very large
commercial projects for over a year now. I started with CouchDB and
Hazelcast for a caching layer, but CouchDB (this was version 0.9 IIRC) and
didn't have a sharding story and was difficult to deploy in a cloud
environment. Plus it was sometimes inflating storage up to 10x the data we
were storing. We were also relying on Elastic Search to provide more
sophisticated query operations anyway.

So, it seemed like a bad idea to store our data in CouchDB and Elastic
Search, so we chose to drop CouchDB. When coupled with Hazelcast
(distributed cache + transaction support), the two couldn't work better
together.

Full disclosure: until ES reaches 1.0 status, I also store all of our
persistable objects in Amazon's SimpleDB "just in case". If my S3 gateway is
trashed, I can reindex straight from SimpleDB.

*Jim Cook
*
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Tue, May 17, 2011 at 7:36 AM, Ville villematti.toivonen@gmail.comwrote:

I want something fast, masterless and fault-tolerant with search. I've
been considering Riak (nice but Search in beta), Cassandra (bloated?,
tedious indexing) and ES.

CouchDB could basically work for me but I've noticed that there are
people not happy with its stability, see

http://labs.linkfluence.net/nosql/2011/03/07/moving_from_couchdb_to_riak.html

I'm only going to store JSON. Do you think CouchDB+ES would be better
approach instead of just ES? I don't know CouchDB that well so what
would be the advantages of using it also?

On May 17, 10:31 am, dpilato david.pil...@douane.finances.gouv.fr
wrote:

Ville wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I was thinking about the same idea few weeks ago because ES is really
easy
to setup and is powerful in managing automatically shards on many
instances.

Then I tried to use CouchDB as the primary document storage and activate
a
river to ES.
Really easy to setup. Really fast also to push documents into CouchDB.
CouchDB allows also to add attachements (such as PDF, XML, oOo, ...) to
JSON
documents. I don't know at this time how ES can handle (or not) PDF
files...

It could be a nice design for your own needs.
So is there any reason for not using CouchDB as a storage for JSON docs ?

Cheers,
David.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-as-a-da...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Karel Minarik) #11

We came to the same conclusion as Jim. Once we are 100% sure that we
can recover the data from gateway after any crash, incident, etc, we
will drop CouchDB as the storage layer. ES is better in almost any
other aspect (space requirements, native clustering, etc).

Karel

On May 17, 3:11 pm, James Cook jc...@tracermedia.com wrote:

FWIW, we have been using ES as our primary datastore on two very large
commercial projects for over a year now. I started with CouchDB and
Hazelcast for a caching layer, but CouchDB (this was version 0.9 IIRC) and
didn't have a sharding story and was difficult to deploy in a cloud
environment. Plus it was sometimes inflating storage up to 10x the data we
were storing. We were also relying on Elastic Search to provide more
sophisticated query operations anyway.

So, it seemed like a bad idea to store our data in CouchDB and Elastic
Search, so we chose to drop CouchDB. When coupled with Hazelcast
(distributed cache + transaction support), the two couldn't work better
together.

Full disclosure: until ES reaches 1.0 status, I also store all of our
persistable objects in Amazon's SimpleDB "just in case". If my S3 gateway is
trashed, I can reindex straight from SimpleDB.

*Jim Cook
*
jc...@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Tue, May 17, 2011 at 7:36 AM, Ville villematti.toivo...@gmail.comwrote:

I want something fast, masterless and fault-tolerant with search. I've
been considering Riak (nice but Search in beta), Cassandra (bloated?,
tedious indexing) and ES.

CouchDB could basically work for me but I've noticed that there are
people not happy with its stability, see

http://labs.linkfluence.net/nosql/2011/03/07/moving_from_couchdb_to_r...

I'm only going to store JSON. Do you think CouchDB+ES would be better
approach instead of just ES? I don't know CouchDB that well so what
would be the advantages of using it also?

On May 17, 10:31 am, dpilato david.pil...@douane.finances.gouv.fr
wrote:

Ville wrote:

I'm planning to use ES as a storage for JSON documents. The idea is to
reference each document with index/type/id and perform search queries
when needed.

I was thinking about the same idea few weeks ago because ES is really
easy
to setup and is powerful in managing automatically shards on many
instances.

Then I tried to use CouchDB as the primary document storage and activate
a
river to ES.
Really easy to setup. Really fast also to push documents into CouchDB.
CouchDB allows also to add attachements (such as PDF, XML, oOo, ...) to
JSON
documents. I don't know at this time how ES can handle (or not) PDF
files...

It could be a nice design for your own needs.
So is there any reason for not using CouchDB as a storage for JSON docs ?

Cheers,
David.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-as-a-da...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #12