ES DataBase Engine

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

Lucene

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 déc. 2012 à 02:54, Gildas Houmard ghoumard@gmail.com a écrit :

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

Hi Dave,
I am interested in the answer but can't get to your tweet. Can you please
reply on the forum.
Regards,
Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

Hello!

Just like David said, ElasticSearch is built on top of Apache Lucene (https://lucene.apache.org/) and its not a database - its a full text search library.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi Dave,

I am interested in the answer but can't get to your tweet. Can you please reply on the forum.

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?

Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

Hi,

I am still trying to grasp the concept of a difference between a search
engine (like a elasticsearch) and a DB (like a mongodb).

Elasticsearch stores the indexed/typed data somewhere:

An example:

*curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d *'{
"name" : "Dilbert Brown" }

So it does store the documents the same way as for example mongodb does –
so what is the advantage of storing the documents using elasticsearch as
opposite to mongodb for example?

This might be a stupid question but I am just trying to understand the
whole concept of db and search engines.

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

My definition of the difference between a search engine and a database is
that a search engine will score and order the results.

Databases, leaving aside fuzzy queries and the like, only return results
that are relevant to the query. These results are ordered based on their
position in the table (unless explicitly ordered). Search engines go beyond
that. Not only do their return the results, but a search engine will tell
you how well a result matched the query. If you search for something in
Google, you expect the top results to be the most relevant to your query.

Search engines also have the added capability of analyzing text: stemming,
synonym expansion, n-grams, etc... You can applies these concepts to
standard databases (eg: single row per ngram), but these solutions are
awkward, while search engines handle them natively.

Elasticsearch can be thought of as a distributed Lucene. You can learn the
search engine side of things by reading up on Lucene.

Cheers,

Ivan

On Mon, Dec 24, 2012 at 6:40 AM, JD jdalecki@tycoint.com wrote:

This might be a stupid question but I am just trying to understand the
whole concept of db and search engines.****

--

JD,

I think Ivan points out an important aspect. Databases do not traditionally
score results. On the other hand (and to make you more confused... sorry
about that), Lucene does not only have concept of full-text queries (which
return scored results) but also filters which return just matching
documents and does not do any scoring (so pretty much the same thing that
databases do).

I think to answer your question you need to reveal more about what your are
really after. What is our use case and what you need.

IMO, the real difference between databases and Lucene is at the low level -
in the internal format how data is stored. In case of Lucene it is
basically inverted index (for details about Lucene 3.6.1 see
Apache Lucene - Index File Formats). This might be quite
detailed and might seem not relevant now, but in fact it can have
consequences about how exactly you can work with your data and how
expensive it is to store and query your data in Lucene index.

For example, how much do you care about transactions? And distributed
transactions? How much is a "real-time" aspect important to you? And how
about security and data privacy? And for example do you really need your
documents to undergo text analysis if the only thing you need is
distributed key/value map implementation?

Just my 2 cents,
Lukas

On Mon, Dec 24, 2012 at 6:40 PM, Ivan Brusic ivan@brusic.com wrote:

My definition of the difference between a search engine and a database is
that a search engine will score and order the results.

Databases, leaving aside fuzzy queries and the like, only return results
that are relevant to the query. These results are ordered based on their
position in the table (unless explicitly ordered). Search engines go beyond
that. Not only do their return the results, but a search engine will tell
you how well a result matched the query. If you search for something in
Google, you expect the top results to be the most relevant to your query.

Search engines also have the added capability of analyzing text: stemming,
synonym expansion, n-grams, etc... You can applies these concepts to
standard databases (eg: single row per ngram), but these solutions are
awkward, while search engines handle them natively.

Elasticsearch can be thought of as a distributed Lucene. You can learn the
search engine side of things by reading up on Lucene.

Cheers,

Ivan

On Mon, Dec 24, 2012 at 6:40 AM, JD jdalecki@tycoint.com wrote:

This might be a stupid question but I am just trying to understand the
whole concept of db and search engines.****

--

--

Thank you very much for the explanation – it really does make sense to me
now after you pointed it out.

Sorry for another question.

In that case does it make sense to combine (if possible at all)
elasticsearch with another database (like mongodb) if all I care is
powerful search engine with scoring feature and not caring about
transactions?

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

If you already have a datastore (mongo, couchbase, postgres...), keep it.

I recommand to have one datastore for basic CRUD operations and Elasticsearch to provide search features.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 24 déc. 2012 à 23:00, JD jdalecki@tycoint.com a écrit :

Thank you very much for the explanation – it really does make sense to me now after you pointed it out.

Sorry for another question.

In that case does it make sense to combine (if possible at all) elasticsearch with another database (like mongodb) if all I care is powerful search engine with scoring feature and not caring about transactions?

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:
I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

From a 10,000 ft view:

A database looks like a book. It contains information, you can retrieve
information, you can read and even write books.

A search engine looks like a keyword index for a book. A search index
contains only compact forms of information of the whole book, carefully
analyzed. But you are also able to index a whole book as is. Each keyword
index has references to locations where you can find it in the book. If you
change something in the book, you need to check if the keyword index is
still up to date.

Regards,

Jörg

--

That depends on the database.
In case of Mongo I'd say that ES is simpler to scale (Mongo's sharding is
screwedhttp://blog.serverdensity.com/does-everyone-hate-mongodb/#comment-1085),
but ES has some danger of split-brain (although increasing
"discovery.zen.minimum_master_nodes" improves this). In Mongo it is
somewhat easier to add arbitrary fields (no need to fine-tune the mappings)
and the query language is more convenient and there are more GUI tools. But
I wasn't happy with MongoDB reliability and personally I'd prefer
standalone ES to Mongo-ES for now.

вторник, 25 декабря 2012 г., 2:00:28 UTC+4 пользователь JD написал:

Thank you very much for the explanation - it really does make sense to me
now after you pointed it out.

Sorry for another question.

In that case does it make sense to combine (if possible at all)
elasticsearch with another database (like mongodb) if all I care is
powerful search engine with scoring feature and not caring about
transactions?

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

FYI, MongoDB just got some progress with full-text search:
https://jira.mongodb.org/browse/SERVER-380

вторник, 25 декабря 2012 г., 12:42:10 UTC+4 пользователь Artem Grinblat
написал:

That depends on the database.
In case of Mongo I'd say that ES is simpler to scale (Mongo's sharding is
screwedhttp://blog.serverdensity.com/does-everyone-hate-mongodb/#comment-1085),
but ES has some danger of split-brain (although increasing
"discovery.zen.minimum_master_nodes" improves this). In Mongo it is
somewhat easier to add arbitrary fields (no need to fine-tune the mappings)
and the query language is more convenient and there are more GUI tools. But
I wasn't happy with MongoDB reliability and personally I'd prefer
standalone ES to Mongo-ES for now.

вторник, 25 декабря 2012 г., 2:00:28 UTC+4 пользователь JD написал:

Thank you very much for the explanation - it really does make sense to me
now after you pointed it out.

Sorry for another question.

In that case does it make sense to combine (if possible at all)
elasticsearch with another database (like mongodb) if all I care is
powerful search engine with scoring feature and not caring about
transactions?

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

Thanks for all suggestions/answers,

Solution with MongoDB supporting full text search is very tempting, but it
is still in development stage – thanks for pointing it to me Artem.

But if we decide to go with the David and Jörg suggestion, I have another
question:

    1. If I change anything in the document stored in the DB (let’s 
      

say mongoDB) do I have to change the same document indexed in elasticsearch
as well – in other words do I have to have a duplicate copy of a document
in elasticsearch (the one that exists in mongodb) or I misunderstood Jörg
reply?
2. 2) If I don’t need a duplicate copy of a document in
elasticsearch, how do I link elasticsearch document with the document
stored in mongodb?

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

Answers:

  1. Yes. You have to duplicate documents. As I said in another thread, with
    Elasticsearch you are building an index. So when a document changes in your
    datastore, you have to update your ES index. In short, you have to send a copy
    of your document into ES.
  2. You can only index document in ES (aka create the index) and you don't have
    to store the source document. Just disable _source in your mapping
    (Elasticsearch Platform — Find real-time answers at scale | Elastic)
    You will be able to find documents and get back index/type/id. Then if you need
    to display results to your users, you will have to read all your documents in
    Mongo.
    If you don't need to display results to your users, that's a good option.
    If you need to display results, I recommand to store the _source (default) as
    with only one round trip, you will have everything you need (instead of 1 in ES
  • 10 in mongo)!
    Does it make sense?

David.

Le 26 décembre 2012 à 09:21, JD jdalecki@tycoint.com a écrit :

Thanks for all suggestions/answers,

Solution with MongoDB supporting full text search is very tempting, but it is
still in development stage – thanks for pointing it to me Artem.

But if we decide to go with the David and Jörg suggestion, I have another
question:

1. 1)     If I change anything in the document stored in the DB (let’s say

mongoDB) do I have to change the same document indexed in elasticsearch as
well – in other words do I have to have a duplicate copy of a document in
elasticsearch (the one that exists in mongodb) or I misunderstood Jörg reply?
2. 2) If I don’t need a duplicate copy of a document in elasticsearch,
how do I link elasticsearch document with the document stored in mongodb?
--><p

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

If you need to display results, I recommand to store the _source (default)
as with only one round trip, you will have everything you need (instead of
1 in ES + 10 in mongo)!

In mongodb you can use the $in statment so you wont need to make 10
querys, but yes the power of es would be better, i am using elasticsearch
as my primary database

On Wednesday, December 26, 2012 1:24:31 AM UTC-8, David Pilato wrote:

Answers:

  1. Yes. You have to duplicate documents. As I said in another thread,
    with Elasticsearch you are building an index. So when a document changes in
    your datastore, you have to update your ES index. In short, you have to
    send a copy of your document into ES.
  2. You can only index document in ES (aka create the index) and you don't
    have to store the source document. Just disable _source in your mapping (
    Elasticsearch Platform — Find real-time answers at scale | Elastic)
    You will be able to find documents and get back index/type/id. Then if
    you need to display results to your users, you will have to read all your
    documents in Mongo.
    If you don't need to display results to your users, that's a good option.
    If you need to display results, I recommand to store the _source
    (default) as with only one round trip, you will have everything you need
    (instead of 1 in ES + 10 in mongo)!
    Does it make sense?

David.

Le 26 décembre 2012 à 09:21, JD <jdal...@tycoint.com <javascript:>> a
écrit :

Thanks for all suggestions/answers,

Solution with MongoDB supporting full text search is very tempting, but it
is still in development stage – thanks for pointing it to me Artem.

But if we decide to go with the David and Jörg suggestion, I have another
question:

    1. If I change anything in the document stored in the DB (let’s 
      

say mongoDB) do I have to change the same document indexed in elasticsearch
as well – in other words do I have to have a duplicate copy of a document
in elasticsearch (the one that exists in mongodb) or I misunderstood Jörg
reply?
2. 2) If I don’t need a duplicate copy of a document in
elasticsearch, how do I link elasticsearch document with the document
stored in mongodb?

--><p

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Yep. That makes a lot of sense - I think I am starting slowly to get a hold
of the elasticsearch concept - thanks a lot for your and others help on
that topic.
Regards,
Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--

  1. Yes, the concept of "live indexing" a database includes the idea of
    either pushing or pulling changes from your database to the indexing
    routine. Note, the Elasticsearch community offers so called "rivers" to
    ease the process of pulling changed data for some well known cases. For
    Mongo DB, see
    https://github.com/richardwilly98/elasticsearch-river-mongodb/

In the Elasticsearch index, there is not really a duplicate of a document.
Think more of a concept, of a "document indexing job". Elasticsearch must
know about the document as a whole to compute new index statistics (words,
frequencies, positions, offsets...)

Lucene offers two modes of writing data into the search index. It can index
data by analyzing it, and it can store data unprocessed. Elasticsearch uses
the two Lucene modes and offers some additional convenience. Usually, you
would have to tell Elasticsearch beforehand that you also want to "store" a
document by setting the attribute "store" to yes. The Elasticsearch
"_source" field, which is the unmodified content to be processed, is stored
by default. So, indexing and storing the _source is enabled by default,
which comes very handy. Out of the box, there is a simple opportunity of
storing JSON documents while indexing them, without the need of tweaking
the default settings. But remember, by doing this, Elasticsearch does not
automatically work like a database. It is just a big search engine index,
augmented with a document store.

  1. Linking to MongoDB depends on your interface in Mongo. If you use a
    RESTful API in Mongo, just index the Mongo coordinates, and retrieve them
    by interpreting the Elasticsearch search results. Elasticsearch offers a
    "triple coordinate", that is, index / type/ id. This triple could be used
    to address your Mongo interface. Most flexible is just using the document
    id in Elasticsearch for addressing.

Jörg

--

Hi,

Thanks everybody for all responses I got for my question.

If I can summarize elasticsearch basic concepts, would that be true to say
that:

  1.    The generic http interface command  is http://*
    

host:port/index/type/id* http://host:port/index/type/id

  1.   ES stores the original document in it’s original form but also every 
    

field of every document under index/type/id gets parsed by the analyzer (which
is set of filters) and is stored in the elasticsearch database – in the
very basic scenario it would be analyzer that applies two filters:

a. Lower case filter

b. Stop words removal filter – as defined by
org.apache.lucene.analysis.StopAnalyzer

  1.   We can have for the same index number of types and for each type 
    

we can have number of ids

  1.   For each id we can store just one document
    

Is this a fair summary?

Regards,

Janusz

On Monday, December 24, 2012 12:54:24 PM UTC+11, Gildas Houmard wrote:

I'm curious, what is the database behind ES ?
Is it a custom made Key/Value store or an existing NoSql product ?

Thanks

--