We were a relatively early elasticsearch adopter and began an in depth
eval comparing it to solr about a year ago (which even back then it
one hands down) with a final production push around half a year ago.
This is no longer the case, but there were some issues in earlier
versions of ES that could lead to data loss. Both of the cases I know
of have been fixed, which were:
- Network partitions leading to data loss - fixed in 0.16
- Running out of disk space leading to corruption. Fixed a long time
ago in lucene
I don't know of any current issues that could lead to data loss,
however, I still have a warm fuzzy feeling having a main data store
and syncing that out to Elasticsearch. Storage is cheap and losing
data can be expensive.
Huy makes a good point, as well. Our main data store has been around
for 10+ years and is deeply entrenched in our infrastructure and
Elasticsearch is tacked on for flexible and extremely fast searching.
I would say that if you are using ES as your main store, you're living
a little dangerously. Last I heard Shay did not recommend ES as the
primary data store, but that may have changed.
Best Regards,
Paul
On Jun 22, 9:04 pm, Huy Phan dac...@gmail.com wrote:
Hi Roger,
Actually ES just "listens" to the changes in CouchDB and import the data
into itself. That means in the end ES still stores all the data and
doesn't make use of CouchDB as its actual backend storage like you said.
To test this, try to get ES index data from CouchDB, then shutdown
CouchDB and search in ES again, you can still see your documents under
_source fields.
But I still vote up for any idea that get ES stores only the indexes
data, and leave the original documents in CouchDB/MongoDB or any NoSQL.
The reason is to avoid data duplication: since your NoSQL was already
there in your system for many purposes and cannot be replaced, you may
want make use of it instead of copy its whole data to ES.
--huy
On 6/23/11 10:07 AM, Roger Studner wrote:
I'm probably missing the obvious.. but what are the main/agreed upon
reasons for using something like CouchDB to actually store "the data"
and having the River stuff make the indexes in ES.
i.e. Why not use ES all by itself if in the end you are going to
index with it.. what is the great advantage to the hybrid solution (I
understand the inserts are slow to a lucene based anything.. but if
you have the River/synchronization from couch =-> ES.. isn't that
still taking up about the same time)
Thanks for any insight that can be provided
Best,
Roger