Difference between db and elasticsearch


(Mohit Anchlia) #1

Please throw some light on this because I am unable to think of
reasons why elasticsearch shouldn't just be used as real time
database. Since with DB you need to write and then for most part run
queries on them. So if elasticsearch is scalable and distributed then
why can't elasticsearch be used as db itself? I am trying to
understand the disadvantages of thinking elasticsearch as database.


(Otis Gospodnetić) #2

Hi Mohit,

You are not the first person to wonder about this. Check the ML
archives. Yes, some people use ES and similar systems in place of
technologies with "database" in their names.

Otis

On Jan 4, 8:00 pm, Mohit Anchlia mohitanch...@gmail.com wrote:

Please throw some light on this because I am unable to think of
reasons why elasticsearch shouldn't just be used as real time
database. Since with DB you need to write and then for most part run
queries on them. So if elasticsearch is scalable and distributed then
why can't elasticsearch be used as db itself? I am trying to
understand the disadvantages of thinking elasticsearch as database.


(Douglas Muth) #3

On Wed, Jan 4, 2012 at 8:00 PM, Mohit Anchlia mohitanchlia@gmail.com wrote:

Please throw some light on this because I am unable to think of
reasons why elasticsearch shouldn't just be used as real time
database. Since with DB you need to write and then for most part run
queries on them. So if elasticsearch is scalable and distributed then
why can't elasticsearch be used as db itself? I am trying to
understand the disadvantages of thinking elasticsearch as database.

It's been my experience much easier to do some things in databases
that you can't easily in ES, such GROUP BY and especially ORDER BY
operations.

If you're looking for a database that does sharding/replication, check
out Cassandra.

-- Doug


(nurikabe) #4

We are experimenting with ES as a database of sorts for documents. I
wouldn't want to use it for complex relational data, but it's great for
documents and corresponding metadata.


(Ivan Brusic) #5

It all depends on what your requirements are.

Despite all the advances that Lucene has made (especially with the
latest 3.5 release), Lucene (and therefore ElasticSearch) is still not
realtime. Not everyone needs access to the latest commit, so
ElasticSearch works great in those scenarios.

Just like most other NoSQL systems, you lose many RDBMS benefits such
as transactions and joining between tables (types/docs/whatever). The
loss of atomic updates is a big hurdle for many. Working with
documents however, can be very liberating after being stuck in the
relation world for years.

--
Ivan

On Thu, Jan 5, 2012 at 7:15 AM, nurikabe eaowens@gmail.com wrote:

We are experimenting with ES as a database of sorts for documents. I
wouldn't want to use it for complex relational data, but it's great for
documents and corresponding metadata.


(Stanislas Polu) #6

Can someone elaborate on the realtime limitations of elastic search?

My understanding was that it was pretty good at it. And that's been verified for me so far.

Cheers

-stan

On Jan 5, 2012, at 7:24 PM, Ivan Brusic ivan@brusic.com wrote:

It all depends on what your requirements are.

Despite all the advances that Lucene has made (especially with the
latest 3.5 release), Lucene (and therefore ElasticSearch) is still not
realtime. Not everyone needs access to the latest commit, so
ElasticSearch works great in those scenarios.

Just like most other NoSQL systems, you lose many RDBMS benefits such
as transactions and joining between tables (types/docs/whatever). The
loss of atomic updates is a big hurdle for many. Working with
documents however, can be very liberating after being stuck in the
relation world for years.

--
Ivan

On Thu, Jan 5, 2012 at 7:15 AM, nurikabe eaowens@gmail.com wrote:

We are experimenting with ES as a database of sorts for documents. I
wouldn't want to use it for complex relational data, but it's great for
documents and corresponding metadata.


(Ivan Brusic) #7

There is a reason why Lucene is calling it "near real time" and not
real time. :slight_smile:

Lucene is good at it, but there are certain scenarios were a commit
needs to be propagated immediately, which Lucene cannot handle.

--
Ivan

On Thu, Jan 5, 2012 at 10:54 AM, Stanislas Polu
polu.stanislas@gmail.com wrote:

Can someone elaborate on the realtime limitations of elastic search?

My understanding was that it was pretty good at it. And that's been verified for me so far.

Cheers

-stan

On Jan 5, 2012, at 7:24 PM, Ivan Brusic ivan@brusic.com wrote:

It all depends on what your requirements are.

Despite all the advances that Lucene has made (especially with the
latest 3.5 release), Lucene (and therefore ElasticSearch) is still not
realtime. Not everyone needs access to the latest commit, so
ElasticSearch works great in those scenarios.

Just like most other NoSQL systems, you lose many RDBMS benefits such
as transactions and joining between tables (types/docs/whatever). The
loss of atomic updates is a big hurdle for many. Working with
documents however, can be very liberating after being stuck in the
relation world for years.

--
Ivan

On Thu, Jan 5, 2012 at 7:15 AM, nurikabe eaowens@gmail.com wrote:

We are experimenting with ES as a database of sorts for documents. I
wouldn't want to use it for complex relational data, but it's great for
documents and corresponding metadata.


(Karussell) #8

Here are some links (it all depends on your requirements):

http://stackoverflow.com/questions/6636508/elastic-search-as-a-database

It's been my experience much easier to do some things in databases
that you can't easily in ES, such GROUP BY and especially ORDER BY
operations.

What is the problem with ORDER BY and ElasticSearch?

Peter.


(Douglas Muth) #9

On Thu, Jan 5, 2012 at 4:43 PM, Karussell tableyourtime@googlemail.com wrote:

What is the problem with ORDER BY and ElasticSearch?

Sorting by an integer? Not a problem. Sorting by something that's a
single word? Not a problem. Sorting by a field that has multiple
words? Problem.

As I understand it, if you index the following string: "the quick fox
jumps over the lazy cheetah", Elastic Search analyzes it and stores it
like this:

["the", "quick", "fox", "jumps", "over", "the", "lazy", "cheetah"]

Kinda hard to sort an array, no? :slight_smile:

Now, there are some workarounds, such as not analyzing that field, or
storing an "untouched" version of that field alongside of the analyzed
version. However, the former method means you can't search on that
field and the latter method means more disk space is used up.

At least, I'm about 90% sure that this is how it works, as I just
dealt with this issue for the first time uh, 2 days ago. If I'm
horribly wrong, someone please correct me. :stuck_out_tongue:

-- Doug
http://twitter.com/dmuth


(Lukáš Vlček) #10

Hi,

Did you check script based sorting? That means, analyze the field, but sort
based on the original _source value of that field.

http://www.elasticsearch.org/guide/reference/api/search/sort.html (check
script based sorting)
http://www.elasticsearch.org/guide/reference/modules/scripting.html (check
source field)

Regards,
Lukas

On Thu, Jan 5, 2012 at 10:53 PM, Douglas Muth doug.muth@gmail.com wrote:

On Thu, Jan 5, 2012 at 4:43 PM, Karussell tableyourtime@googlemail.com
wrote:

What is the problem with ORDER BY and ElasticSearch?

Sorting by an integer? Not a problem. Sorting by something that's a
single word? Not a problem. Sorting by a field that has multiple
words? Problem.

As I understand it, if you index the following string: "the quick fox
jumps over the lazy cheetah", Elastic Search analyzes it and stores it
like this:

["the", "quick", "fox", "jumps", "over", "the", "lazy", "cheetah"]

Kinda hard to sort an array, no? :slight_smile:

Now, there are some workarounds, such as not analyzing that field, or
storing an "untouched" version of that field alongside of the analyzed
version. However, the former method means you can't search on that
field and the latter method means more disk space is used up.

At least, I'm about 90% sure that this is how it works, as I just
dealt with this issue for the first time uh, 2 days ago. If I'm
horribly wrong, someone please correct me. :stuck_out_tongue:

-- Doug
http://twitter.com/dmuth


(plaflamme) #11

And how would you expect a database to sort rows with a column containing
"the quick brown fox..."?

Isn't this what "scoring" results is about? Using term frequencies and
other goodies...?

Philippe

On Thu, Jan 5, 2012 at 16:53, Douglas Muth doug.muth@gmail.com wrote:

On Thu, Jan 5, 2012 at 4:43 PM, Karussell tableyourtime@googlemail.com
wrote:

What is the problem with ORDER BY and ElasticSearch?

Sorting by an integer? Not a problem. Sorting by something that's a
single word? Not a problem. Sorting by a field that has multiple
words? Problem.

As I understand it, if you index the following string: "the quick fox
jumps over the lazy cheetah", Elastic Search analyzes it and stores it
like this:

["the", "quick", "fox", "jumps", "over", "the", "lazy", "cheetah"]

Kinda hard to sort an array, no? :slight_smile:

Now, there are some workarounds, such as not analyzing that field, or
storing an "untouched" version of that field alongside of the analyzed
version. However, the former method means you can't search on that
field and the latter method means more disk space is used up.

At least, I'm about 90% sure that this is how it works, as I just
dealt with this issue for the first time uh, 2 days ago. If I'm
horribly wrong, someone please correct me. :stuck_out_tongue:

-- Doug
http://twitter.com/dmuth


(Douglas Muth) #12

On Thu, Jan 5, 2012 at 7:38 PM, Philippe Laflamme
philippe.laflamme@obiba.org wrote:

And how would you expect a database to sort rows with a column containing
"the quick brown fox..."?

Alphabetically, of course.

The issue I ran into the other day was trying to sort results by the
name of a venue, ignoring what the score was. Easily enough done in a
traditional SQL database, but a little more difficult in Elastic
Search. (Of course, this meant completely disregarding the scoring
the results...)

-- Doug


(medcl.net) #13

anybody did a performance test with the script based sorting?
i am wondering if sorting on a large dataset with script will be very slow

-----Original Message-----
From: Douglas Muth
Sent: Friday, January 06, 2012 8:44 AM
To: elasticsearch@googlegroups.com
Subject: Re: Difference between db and elasticsearch

On Thu, Jan 5, 2012 at 7:38 PM, Philippe Laflamme
philippe.laflamme@obiba.org wrote:

And how would you expect a database to sort rows with a column containing
"the quick brown fox..."?

Alphabetically, of course.

The issue I ran into the other day was trying to sort results by the
name of a venue, ignoring what the score was. Easily enough done in a
traditional SQL database, but a little more difficult in Elastic
Search. (Of course, this meant completely disregarding the scoring
the results...)

-- Doug


(Shay Banon) #14

You can always mark the field as not analyzed in the mapping (or use multi
field mapping for analyzed and not analyzed versions) and then sort based
on it.

On Fri, Jan 6, 2012 at 2:44 AM, Douglas Muth doug.muth@gmail.com wrote:

On Thu, Jan 5, 2012 at 7:38 PM, Philippe Laflamme
philippe.laflamme@obiba.org wrote:

And how would you expect a database to sort rows with a column containing
"the quick brown fox..."?

Alphabetically, of course.

The issue I ran into the other day was trying to sort results by the
name of a venue, ignoring what the score was. Easily enough done in a
traditional SQL database, but a little more difficult in Elastic
Search. (Of course, this meant completely disregarding the scoring
the results...)

-- Doug


(system) #15