I'm new to elasticsearch, been using Solr but looking for something
that support real time indexing
When I index a document in ES, do I need to commit that document
for it to be visible in search? Or does the concept of "commit" even
exist in ES?
A lot of my indexing is not adding new documents but updating
existing documents. Does real time search work for updating existing
documents? For example, if I updated document id:123 and then
immediately did a search, would I be seeing the updated document, the
original document, or even both documents?
I'm new to elasticsearch, been using Solr but looking for something
that support real time indexing
When I index a document in ES, do I need to commit that document
for it to be visible in search? Or does the concept of "commit" even
exist in ES?
There is no concept of commit in elasticsearch as you think it is. When you
index/delete/update a document, the changes is persisted (and replicated).
The document will become visible for search once the index is "refreshed"
(note, does not map to a Lucene commit). By default, the index is refreshed
every 1 second.
A lot of my indexing is not adding new documents but updating
existing documents. Does real time search work for updating existing
documents? For example, if I updated document id:123 and then
immediately did a search, would I be seeing the updated document, the
original document, or even both documents?
An updated document will be visible for search once the index has been
refreshed. One thing to note is that in master (upcoming 0.17), a full
realtime GET API (get by id) has been implemented.
There is no concept of commit in elasticsearch as you think it is. When you
index/delete/update a document, the changes is persisted (and replicated).
The document will become visible for search once the index is "refreshed"
(note, does not map to a Lucene commit). By default, the index is refreshed
every 1 second.
Does refresh slow down concurrent searching? In Solr commits tend to
be blocking and slow everything down.
An updated document will be visible for search once the index has been
refreshed. One thing to note is that in master (upcoming 0.17), a full
realtime GET API (get by id) has been implemented.
A lot of my updates is to update the "popularity" field of a document,
which I use to rank search results. Users votes are used to change the
"popularity" of a document.
Since the "popularity" field changes frequently, in Solr I'd need to
update the "popularity" field in MySQL, then use cron jobs to pull
data out from MySQL to index to Solr periodically so as not to
overwhelm Solr. It's kind of messy. Do I need to do something similar
in ES? Or can I just update the "popularity" field in ES whenever a
user casts a vote, and then set refresh_interval to say 5s and let ES
handles the periodic index updates (every 5s in this case)?
Since the "popularity" field changes frequently, in Solr I'd need to
update the "popularity" field in MySQL, then use cron jobs to pull
data out from MySQL to index to Solr periodically so as not to
overwhelm Solr. It's kind of messy. Do I need to do something similar
in ES? Or can I just update the "popularity" field in ES whenever a
user casts a vote, and then set refresh_interval to say 5s and let ES
handles the periodic index updates (every 5s in this case)?
You can just update the popularity field immediately. I wouldn't bother
about changing the default refresh value until you have actual evidence
that it is a problem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.