Questions about real time indexing & search


(Andy-2) #1

Hi,

I'm new to elasticsearch, been using Solr but looking for something
that support real time indexing

  1. When I index a document in ES, do I need to commit that document
    for it to be visible in search? Or does the concept of "commit" even
    exist in ES?

  2. A lot of my indexing is not adding new documents but updating
    existing documents. Does real time search work for updating existing
    documents? For example, if I updated document id:123 and then
    immediately did a search, would I be seeing the updated document, the
    original document, or even both documents?

Thanks


(Shay Banon) #2

On Mon, Jul 18, 2011 at 5:28 AM, Andy selforganized@gmail.com wrote:

Hi,

I'm new to elasticsearch, been using Solr but looking for something
that support real time indexing

  1. When I index a document in ES, do I need to commit that document
    for it to be visible in search? Or does the concept of "commit" even
    exist in ES?

There is no concept of commit in elasticsearch as you think it is. When you
index/delete/update a document, the changes is persisted (and replicated).
The document will become visible for search once the index is "refreshed"
(note, does not map to a Lucene commit). By default, the index is refreshed
every 1 second.

  1. A lot of my indexing is not adding new documents but updating
    existing documents. Does real time search work for updating existing
    documents? For example, if I updated document id:123 and then
    immediately did a search, would I be seeing the updated document, the
    original document, or even both documents?

An updated document will be visible for search once the index has been
refreshed. One thing to note is that in master (upcoming 0.17), a full
realtime GET API (get by id) has been implemented.

Thanks


(Andy-2) #3

There is no concept of commit in elasticsearch as you think it is. When you
index/delete/update a document, the changes is persisted (and replicated).
The document will become visible for search once the index is "refreshed"
(note, does not map to a Lucene commit). By default, the index is refreshed
every 1 second.

Does refresh slow down concurrent searching? In Solr commits tend to
be blocking and slow everything down.

An updated document will be visible for search once the index has been
refreshed. One thing to note is that in master (upcoming 0.17), a full
realtime GET API (get by id) has been implemented.

A lot of my updates is to update the "popularity" field of a document,
which I use to rank search results. Users votes are used to change the
"popularity" of a document.

Since the "popularity" field changes frequently, in Solr I'd need to
update the "popularity" field in MySQL, then use cron jobs to pull
data out from MySQL to index to Solr periodically so as not to
overwhelm Solr. It's kind of messy. Do I need to do something similar
in ES? Or can I just update the "popularity" field in ES whenever a
user casts a vote, and then set refresh_interval to say 5s and let ES
handles the periodic index updates (every 5s in this case)?


(Clinton Gormley) #4

Hi Andy

Does refresh slow down concurrent searching? In Solr commits tend to
be blocking and slow everything down.

An ElasticSearch refresh is not the same as a Solr commit. It has some
performance impact, but is much much lighter than a commit:

http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/

Since the "popularity" field changes frequently, in Solr I'd need to
update the "popularity" field in MySQL, then use cron jobs to pull
data out from MySQL to index to Solr periodically so as not to
overwhelm Solr. It's kind of messy. Do I need to do something similar
in ES? Or can I just update the "popularity" field in ES whenever a
user casts a vote, and then set refresh_interval to say 5s and let ES
handles the periodic index updates (every 5s in this case)?

You can just update the popularity field immediately. I wouldn't bother
about changing the default refresh value until you have actual evidence
that it is a problem.

clint


(system) #5