Bravo Shay and General Conceptual Questions

First off, I just want to thank Shay from the bottom of my heart and
whomever else is behind the ElasticSearch project. It sounds like the
answer to my dreams! I get almost tearful just reading the general
overview! :slight_smile: Really...

I am trying to learn as much as possible now... reading all the blog
posts etc. But two general conceptual questions persist in my mind
which I'm hoping the mailing list can provide some guidance for.

  1. Latency and the Searching and Indexing Process

I was just wondering generally how the data is indexed. Is it indexed
incrementally whenever a search is requested ( a la couchdb )? Is it
indexed whenever data is PUT in? In other words, how is relatively
low latency ensured when those beautifully simple http search requests
are made?

  1. Integration with other NoSQL Projects

I am currently leaning towards Riak. But, unfortunately, it does not
yet have a searching and indexing feature. ElasticSearch seems to act
as a datastore as well. So what is the impetus to use Riak or other
NoSQL projects if ES can take care of the datastore aspect as well
( especially now in light of the Amazon EBS and S3 modules )?

Regarding 2), here is the first thing that came to mind for me:

Some of these "nosql" datastores are MapReduce friendly, meaning there
are mechanisms and tools for process the stored data using MapReduce.
Elastic Search doesn't have that and I'm not sure if that's on the
road map.

You could also ask your question 2) the other way around. Why use ES
(or any other search solution for that matter) instead of just using
one of those data stores? I'm answering some interview questions
today and below is a snippet from the answer for a related question:

"Of course, just as people are learning this, a new breed of non-
relational databases has been gaining on popularity for the last few
years. These databases tend to have better support for full-text
search, though that support is commonly provided by simply integrating
the database with an existing and proven search library like Lucene at
the lower level, so that to the application using this database, the
search functionality appears to come from the database itself."

Concretely, ES and Terrastore seems to be great friends - exactly as
described above, I believe: Google Code Archive - Long-term storage for Google Code Project Hosting.

Otis

Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch

On Jul 7, 9:26 am, Ted Karmel ted.kar...@gmail.com wrote:

First off, I just want to thank Shay from the bottom of my heart and
whomever else is behind the Elasticsearch project. It sounds like the
answer to my dreams! I get almost tearful just reading the general
overview! :slight_smile: Really...

I am trying to learn as much as possible now... reading all the blog
posts etc. But two general conceptual questions persist in my mind
which I'm hoping the mailing list can provide some guidance for.

  1. Latency and the Searching and Indexing Process

I was just wondering generally how the data is indexed. Is it indexed
incrementally whenever a search is requested ( a la couchdb )? Is it
indexed whenever data is PUT in? In other words, how is relatively
low latency ensured when those beautifully simple http search requests
are made?

  1. Integration with other NoSQL Projects

I am currently leaning towards Riak. But, unfortunately, it does not
yet have a searching and indexing feature. Elasticsearch seems to act
as a datastore as well. So what is the impetus to use Riak or other
NoSQL projects if ES can take care of the datastore aspect as well
( especially now in light of the Amazon EBS and S3 modules )?

Hi, first of all, thanks for the compliments ;).

  1. Data is indexed when you index, search operations operate on an already
    indexed data. As usual, if you do a performance test, give it time to warm
    up ;).

  2. There are two questions here. The first is how elasticsearch works in
    with other nosql solutions, and if elasticsearch can be used as your main
    data store. Both were answered on the following thread:
    http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/e19eab5060d1bd55/.
    If you still have questions, I will be happy to answer them.

One point regarding Riak Search, its not out there yet, but from what I
understand they chose to do term partitioning and not document partitioning
for distributed search, which is going to be very interesting to see how it
holds in the field...

-shay.banon

On Wed, Jul 7, 2010 at 9:18 PM, Otis otis.gospodnetic@gmail.com wrote:

Regarding 2), here is the first thing that came to mind for me:

Some of these "nosql" datastores are MapReduce friendly, meaning there
are mechanisms and tools for process the stored data using MapReduce.
Elastic Search doesn't have that and I'm not sure if that's on the
road map.

You could also ask your question 2) the other way around. Why use ES
(or any other search solution for that matter) instead of just using
one of those data stores? I'm answering some interview questions
today and below is a snippet from the answer for a related question:

"Of course, just as people are learning this, a new breed of non-
relational databases has been gaining on popularity for the last few
years. These databases tend to have better support for full-text
search, though that support is commonly provided by simply integrating
the database with an existing and proven search library like Lucene at
the lower level, so that to the application using this database, the
search functionality appears to come from the database itself."

Concretely, ES and Terrastore seems to be great friends - exactly as
described above, I believe:
Google Code Archive - Long-term storage for Google Code Project Hosting.

Otis

Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch

On Jul 7, 9:26 am, Ted Karmel ted.kar...@gmail.com wrote:

First off, I just want to thank Shay from the bottom of my heart and
whomever else is behind the Elasticsearch project. It sounds like the
answer to my dreams! I get almost tearful just reading the general
overview! :slight_smile: Really...

I am trying to learn as much as possible now... reading all the blog
posts etc. But two general conceptual questions persist in my mind
which I'm hoping the mailing list can provide some guidance for.

  1. Latency and the Searching and Indexing Process

I was just wondering generally how the data is indexed. Is it indexed
incrementally whenever a search is requested ( a la couchdb )? Is it
indexed whenever data is PUT in? In other words, how is relatively
low latency ensured when those beautifully simple http search requests
are made?

  1. Integration with other NoSQL Projects

I am currently leaning towards Riak. But, unfortunately, it does not
yet have a searching and indexing feature. Elasticsearch seems to act
as a datastore as well. So what is the impetus to use Riak or other
NoSQL projects if ES can take care of the datastore aspect as well
( especially now in light of the Amazon EBS and S3 modules )?

On Wed, Jul 7, 2010 at 8:18 PM, Otis otis.gospodnetic@gmail.com wrote:

Concretely, ES and Terrastore seems to be great friends - exactly as
described above, I believe: Google Code Archive - Long-term storage for Google Code Project Hosting.

+1 ... They are indeed :slight_smile:
We're planning for even tighter Terrastore/ES integration in the
future ... contributions are obviously warmly accepted :slight_smile:

Cheers,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob