I'm glad this discussion took off, as it is something that I have been
pondering about for a while now as well.
For my latest project I started off with a large tech stack... web
framework, database, message queue, distributed file system, search
index etc. Life was great, everything was modular and decoupled and it
was all going to fit into place beautifully. In the test environment I
set up half a dozen virtual machines, each running their own component
so that I have nice isolation and can easily pinpoint bottlenecks.
Unfortunately, marvelling over this grandiose architecture was short
lived. It wasn't long before I started feeling the pain of keeping up
with the latest and greatest for each of these components. Learning
their hidden pitfalls and secrets. Then I started thinking about
scaling out, and even though all these components were elastic, cloud-
ready and , each one had a different way of sharding
and replicating. So now I had to learn how to scale, monitor,
optimize, back up and configure 4 different technologies written in
different languages and having different dependencies. My head started
to hurt, it was time for a change of plan, for a new mantra -
sometimes simpler is better!
In my particular case, I used mongo as my nosql store and I was
definitely seeing a bit of an overlap between it and ES. The type of
data was simple and I didn't have a need for map-reduce operations or
complex set relations (otherwise I wouldn't be using a nosql solution
in the first place!), I just needed a flexible data model and a fine-
grained way to search & retrieve documents, which is what elastic
search was made for in the first place. The fact that I could
partition and replicate my data using elastic search, in a way
reminiscent of mongo made the question of why even more obvious.
So I took the plunge and decided to ditch mongo for the time being and
use ES as a primary form of storage. I asked around on groups and
forums and couldn't find any glaringly obvious problem with using
lucene as a storage engine. I also looked at Terrastore briefly but
couldn't really see from the architecture diagrams what it uses for
persistence. I assume Terracotta; but based on what I read so far,
terracotta is not really well suited for permanent data but rather
throw-away data. It was interesting to see Sergio mentioning under one
of his points that "Lucene isn't intended as a storage solution; it
may or may not work for your needs, but again, that's not the intended
use (and in my own experience, it doesn't work)". I think this is
something that would be worthwhile analysing and providing real use
cases and war stories of particular situations where lucene was not a
good storage solution and where it doesn't work (and how Terrastore
addresses and solves them).
TL;DR Large tech stacks can quickly turn into administrative/learning
nightmares. Sometimes the benefits of integrating multiple solutions
into one component can far outweigh the risks and problems, especially
in a case like this where many people are confused and already see an
overlap (i.e. using ES as a nosql store).
On Mar 17, 7:29 pm, Eks Dev eks...@googlemail.com wrote:
I just started playing with ES and had to comment this subject.
imo, this question in subject (discussion is great!) is plain wrong. What I
would like to see somewhere is rather search and "nosql db". Keeping these
two topics apart is like saying, OK let us separate DBMS from indexing and
SQL. Search is great, nosqldb-s are great, but not enough.
"traditional search" is just one application, useful, but just one
application. More traditional, and much more general computation model is to
have some way to locate data (old way "SQL", new way "search"), retrieve
data (old way "SQL", new way nosql KV stores), do something with data (SQL
vs map-reduce today on mega-data) and put it back to storage/deliver
What I am trying to say, the "new way" has one missing link, keeps data in
two completely separate worlds, technologically and logically apart (think
e.g. hbase and ES or cassandra and solr). This is expensive, hard to setup,
hard to keep in sync, duplicates demand on resources ...
In ideal world, imagine hbase where each node keeps embedded lucene to
expose search part with all this magic Shay is doing with ES. This would
become one infrastructure to keep all players in sync , one set of APIs to
talk to clients... It Seams riak goes this way.
I think I see this way of thinking behind ES, so imagine ES doing
map-reduce, keeping your data safe like hbase...
Dreaming in public is, I guess, OK
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-vs-NoSQ...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.