"wait for refresh" command?

Curtis_Caravone · June 9, 2011, 3:40pm

Is there a way to tell when an indexed document has become visible for
searching, short of forcing an index refresh?

Something like a "wait for refresh" option to the indexing commands (server
will not send response message until the doc has been indexed and the next
refresh has happened)
or even just a "wait for next refresh" command?

This could be very useful for enforcing some level of inter-document
consistency.

For example, in a relational parent/child model (child doc -> parent doc),
one might want to ensure that the parent doc is visible before any child doc
becomes visible.

Curtis

kimchy · June 9, 2011, 7:21pm

No, there isn't one, and its problematic to provide? It means that elasticsearch will need to keep track of the indexed documents, so you will be able to tell if it happened after them. Parent child visibility can be ensure by indexing the parent doc before the child doc for example.

On Thursday, June 9, 2011 at 6:40 PM, Curtis Caravone wrote:

Is there a way to tell when an indexed document has become visible for searching, short of forcing an index refresh?

Something like a "wait for refresh" option to the indexing commands (server will not send response message until the doc has been indexed and the next refresh has happened)
or even just a "wait for next refresh" command?

This could be very useful for enforcing some level of inter-document consistency.

For example, in a relational parent/child model (child doc -> parent doc), one might want to ensure that the parent doc is visible before any child doc becomes visible.

Curtis

Karussell1 · June 9, 2011, 7:27pm

I'm not aware of such a feature although it would be really nice to
have such a listener capability.

Maybe you create an issue for this?

Possible workarounds are do refresh yourself (not good !) or wait
at least as long the the refresh_interval is (or two times of this)

Regards,
Peter.

Karussell1 · June 9, 2011, 8:22pm

On Jun 9, 9:21 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, there isn't one, and its problematic to provide? It means that elasticsearch will need to keep track of the indexed documents, so you will be able to tell if it happened after them. Parent child visibility can be ensure by indexing the parent doc before the child doc for example.

E.g. FAST ESP provides listeners for this purposes and especially with
the NRT feature one now cannot give a guarantuee when a doc gets
indexed.

It means that elasticsearch will need to keep track of the indexed documents

maybe only for a subset? or one can mark only one document as a
trigger?

kimchy · June 9, 2011, 8:25pm

Callbacks are problematic, especially over HTTP, so its not really an option because of its complexity. I think that most times you can really work around it, use versions, or we can work on other features to enable it. For example, we can easily provide a full realtime "doc exists" API (with realtime version response).

On Thursday, June 9, 2011 at 11:22 PM, Karussell wrote:

On Jun 9, 9:21 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

No, there isn't one, and its problematic to provide? It means that elasticsearch will need to keep track of the indexed documents, so you will be able to tell if it happened after them. Parent child visibility can be ensure by indexing the parent doc before the child doc for example.

E.g. FAST ESP provides listeners for this purposes and especially with
the NRT feature one now cannot give a guarantuee when a doc gets
indexed.

It means that elasticsearch will need to keep track of the indexed documents

maybe only for a subset? or one can mark only one document as a
trigger?

Curtis_Caravone · June 21, 2011, 6:12pm

I was thinking of something really simple:

"wait for refresh" call registers a listener
When the next index refresh finishes, notify (and deregister) all
registered listeners -> each listener sends an "OK" response

No need to track documents, just keep a set of listeners (maybe you need one
set per index or per shard?)

I thought all the request handling was asynchronous anyway, so the callback
for the HTTP response wouldn't be a problem?

To give a little more detail on my use case:

I'm not really doing simple parent/child relationships, it's more like:

A: Index a set of documents
B: Index a related set of documents into another index

I don't want any of the docs from set B to show up in search results until I
am sure that all the docs from set A are visible to searching as well.

There may not be explicit parent-child relationships to follow at the
document level. All retrieval is done through search queries.

Curtis

On Thu, Jun 9, 2011 at 1:25 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Callbacks are problematic, especially over HTTP, so its not really an
option because of its complexity. I think that most times you can really
work around it, use versions, or we can work on other features to enable it.
For example, we can easily provide a full realtime "doc exists" API (with
realtime version response).

On Thursday, June 9, 2011 at 11:22 PM, Karussell wrote:

On Jun 9, 9:21 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, there isn't one, and its problematic to provide? It means that
elasticsearch will need to keep track of the indexed documents, so you will
be able to tell if it happened after them. Parent child visibility can be
ensure by indexing the parent doc before the child doc for example.

E.g. FAST ESP provides listeners for this purposes and especially with
the NRT feature one now cannot give a guarantuee when a doc gets
indexed.

It means that elasticsearch will need to keep track of the indexed
documents

maybe only for a subset? or one can mark only one document as a
trigger?

Remy_Gendron · June 22, 2011, 2:44am

We are migrating to es from hibernate search and we intend to use es
not only to support full text searching but, also as our primary mean
of accessing all of our data... Think nosql here. Hence, we are also
indexing the relations between entities so that we can filter on them.

A simple use case made difficult by the NRT nature of es is as follow:

Use es to load a user's tasks
Display the tasks in a list view
Mark one task a completed and update es
Refresh the list view

The expectation is that the completed task will not show up. I am
willing to block until all indexes and shards involved have been
refreshed.

I totally get the difficulty in implementing such a feature. Until we
have full real time, the following would be great:

I don't care about changes made by other users. It's OK to have a 1
sec delay before visibility.
But for actions taken by the user, I need the next page update or
navigation to reflect that.
So, when a use case requires immediate visibility, the caller will
provide a list of document uid/version pairs along with the query. The
query should block until all the specified documents are visible at
the specified versions (or newer). Then the query executes and
returns.

In fact, I could easily implement this in my service layer but, it
would be great if this was natively supported.

Btw, with the latest nrt work on lucene, do you expect we will get
full rt anytime soon?

Thanks for an incredible product!

kimchy · June 23, 2011, 11:35am

A blocking option, where one sends a bulk request (for example) and it will block until the changes done in the bulk request are made visible by the next refresh is possible. Open an issue.

Lucene refresh cost will get lower in the next Lucene version, but not to a real time level.

On Wednesday, June 22, 2011 at 5:44 AM, Remy Gendron wrote:

We are migrating to es from hibernate search and we intend to use es
not only to support full text searching but, also as our primary mean
of accessing all of our data... Think nosql here. Hence, we are also
indexing the relations between entities so that we can filter on them.

A simple use case made difficult by the NRT nature of es is as follow:

Use es to load a user's tasks

Display the tasks in a list view

Mark one task a completed and update es

Refresh the list view

The expectation is that the completed task will not show up. I am
willing to block until all indexes and shards involved have been
refreshed.

I totally get the difficulty in implementing such a feature. Until we
have full real time, the following would be great:

I don't care about changes made by other users. It's OK to have a 1
sec delay before visibility.

But for actions taken by the user, I need the next page update or
navigation to reflect that.

So, when a use case requires immediate visibility, the caller will
provide a list of document uid/version pairs along with the query. The
query should block until all the specified documents are visible at
the specified versions (or newer). Then the query executes and
returns.

In fact, I could easily implement this in my service layer but, it
would be great if this was natively supported.

Btw, with the latest nrt work on lucene, do you expect we will get
full rt anytime soon?

Thanks for an incredible product!

Curtis_Caravone · June 25, 2011, 2:55pm

I opened an issue:

thanks!

On Thu, Jun 23, 2011 at 4:35 AM, Shay Banon shay.banon@elasticsearch.comwrote:

A blocking option, where one sends a bulk request (for example) and it
will block until the changes done in the bulk request are made visible by
the next refresh is possible. Open an issue.

Lucene refresh cost will get lower in the next Lucene version, but not to a
real time level.

On Wednesday, June 22, 2011 at 5:44 AM, Remy Gendron wrote:

We are migrating to es from hibernate search and we intend to use es
not only to support full text searching but, also as our primary mean
of accessing all of our data... Think nosql here. Hence, we are also
indexing the relations between entities so that we can filter on them.

A simple use case made difficult by the NRT nature of es is as follow:

Use es to load a user's tasks

Display the tasks in a list view

Mark one task a completed and update es

Refresh the list view

The expectation is that the completed task will not show up. I am
willing to block until all indexes and shards involved have been
refreshed.

I totally get the difficulty in implementing such a feature. Until we
have full real time, the following would be great:

I don't care about changes made by other users. It's OK to have a 1
sec delay before visibility.

But for actions taken by the user, I need the next page update or
navigation to reflect that.

So, when a use case requires immediate visibility, the caller will
provide a list of document uid/version pairs along with the query. The
query should block until all the specified documents are visible at
the specified versions (or newer). Then the query executes and
returns.

In fact, I could easily implement this in my service layer but, it
would be great if this was natively supported.

Btw, with the latest nrt work on lucene, do you expect we will get
full rt anytime soon?

Thanks for an incredible product!

Topic		Replies	Views
Request when Elasticsearch is finished indexing document Elasticsearch	3	701	August 22, 2017
Is it possible to know if the index is in “refreshed” state Elasticsearch	9	2077	July 5, 2017
When is data actually indexed in Elasticsearch? Elasticsearch	1	476	July 6, 2017
How to confirm document has been indexed and posted? Elasticsearch	2	443	February 17, 2017
Make sure document is indexed before search for it Elasticsearch	7	2629	July 5, 2017

"wait for refresh" command?

Related topics