"wait for refresh" command?


(Curtis Caravone) #1

Is there a way to tell when an indexed document has become visible for
searching, short of forcing an index refresh?

Something like a "wait for refresh" option to the indexing commands (server
will not send response message until the doc has been indexed and the next
refresh has happened)
or even just a "wait for next refresh" command?

This could be very useful for enforcing some level of inter-document
consistency.

For example, in a relational parent/child model (child doc -> parent doc),
one might want to ensure that the parent doc is visible before any child doc
becomes visible.

Curtis


(Shay Banon) #2

No, there isn't one, and its problematic to provide? It means that elasticsearch will need to keep track of the indexed documents, so you will be able to tell if it happened after them. Parent child visibility can be ensure by indexing the parent doc before the child doc for example.

On Thursday, June 9, 2011 at 6:40 PM, Curtis Caravone wrote:

Is there a way to tell when an indexed document has become visible for searching, short of forcing an index refresh?

Something like a "wait for refresh" option to the indexing commands (server will not send response message until the doc has been indexed and the next refresh has happened)
or even just a "wait for next refresh" command?

This could be very useful for enforcing some level of inter-document consistency.

For example, in a relational parent/child model (child doc -> parent doc), one might want to ensure that the parent doc is visible before any child doc becomes visible.

Curtis


(Karussell) #3

I'm not aware of such a feature although it would be really nice to
have such a listener capability.

Maybe you create an issue for this?

Possible workarounds are do refresh yourself (not good :slight_smile: !) or wait
at least as long the the refresh_interval is (or two times of this)

Regards,
Peter.


(Karussell) #4

On Jun 9, 9:21 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, there isn't one, and its problematic to provide? It means that elasticsearch will need to keep track of the indexed documents, so you will be able to tell if it happened after them. Parent child visibility can be ensure by indexing the parent doc before the child doc for example.

E.g. FAST ESP provides listeners for this purposes and especially with
the NRT feature one now cannot give a guarantuee when a doc gets
indexed.

It means that elasticsearch will need to keep track of the indexed documents

maybe only for a subset? or one can mark only one document as a
trigger?


(Shay Banon) #5

Callbacks are problematic, especially over HTTP, so its not really an option because of its complexity. I think that most times you can really work around it, use versions, or we can work on other features to enable it. For example, we can easily provide a full realtime "doc exists" API (with realtime version response).

On Thursday, June 9, 2011 at 11:22 PM, Karussell wrote:

On Jun 9, 9:21 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

No, there isn't one, and its problematic to provide? It means that elasticsearch will need to keep track of the indexed documents, so you will be able to tell if it happened after them. Parent child visibility can be ensure by indexing the parent doc before the child doc for example.

E.g. FAST ESP provides listeners for this purposes and especially with
the NRT feature one now cannot give a guarantuee when a doc gets
indexed.

It means that elasticsearch will need to keep track of the indexed documents

maybe only for a subset? or one can mark only one document as a
trigger?


(Curtis Caravone) #6

I was thinking of something really simple:

  • "wait for refresh" call registers a listener
  • When the next index refresh finishes, notify (and deregister) all
    registered listeners -> each listener sends an "OK" response

No need to track documents, just keep a set of listeners (maybe you need one
set per index or per shard?)

I thought all the request handling was asynchronous anyway, so the callback
for the HTTP response wouldn't be a problem?

To give a little more detail on my use case:

I'm not really doing simple parent/child relationships, it's more like:

A: Index a set of documents
B: Index a related set of documents into another index

I don't want any of the docs from set B to show up in search results until I
am sure that all the docs from set A are visible to searching as well.

There may not be explicit parent-child relationships to follow at the
document level. All retrieval is done through search queries.

Curtis

On Thu, Jun 9, 2011 at 1:25 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Callbacks are problematic, especially over HTTP, so its not really an
option because of its complexity. I think that most times you can really
work around it, use versions, or we can work on other features to enable it.
For example, we can easily provide a full realtime "doc exists" API (with
realtime version response).

On Thursday, June 9, 2011 at 11:22 PM, Karussell wrote:

On Jun 9, 9:21 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, there isn't one, and its problematic to provide? It means that
elasticsearch will need to keep track of the indexed documents, so you will
be able to tell if it happened after them. Parent child visibility can be
ensure by indexing the parent doc before the child doc for example.

E.g. FAST ESP provides listeners for this purposes and especially with
the NRT feature one now cannot give a guarantuee when a doc gets
indexed.

It means that elasticsearch will need to keep track of the indexed
documents

maybe only for a subset? or one can mark only one document as a
trigger?


(Remy Gendron) #7

We are migrating to es from hibernate search and we intend to use es
not only to support full text searching but, also as our primary mean
of accessing all of our data... Think nosql here. Hence, we are also
indexing the relations between entities so that we can filter on them.

A simple use case made difficult by the NRT nature of es is as follow:

  • Use es to load a user's tasks
  • Display the tasks in a list view
  • Mark one task a completed and update es
  • Refresh the list view

The expectation is that the completed task will not show up. I am
willing to block until all indexes and shards involved have been
refreshed.

I totally get the difficulty in implementing such a feature. Until we
have full real time, the following would be great:

  • I don't care about changes made by other users. It's OK to have a 1
    sec delay before visibility.
  • But for actions taken by the user, I need the next page update or
    navigation to reflect that.
  • So, when a use case requires immediate visibility, the caller will
    provide a list of document uid/version pairs along with the query. The
    query should block until all the specified documents are visible at
    the specified versions (or newer). Then the query executes and
    returns.

In fact, I could easily implement this in my service layer but, it
would be great if this was natively supported.

Btw, with the latest nrt work on lucene, do you expect we will get
full rt anytime soon?

Thanks for an incredible product!


(Shay Banon) #8

A blocking option, where one sends a bulk request (for example) and it will block until the changes done in the bulk request are made visible by the next refresh is possible. Open an issue.

Lucene refresh cost will get lower in the next Lucene version, but not to a real time level.

On Wednesday, June 22, 2011 at 5:44 AM, Remy Gendron wrote:

We are migrating to es from hibernate search and we intend to use es
not only to support full text searching but, also as our primary mean
of accessing all of our data... Think nosql here. Hence, we are also
indexing the relations between entities so that we can filter on them.

A simple use case made difficult by the NRT nature of es is as follow:

  • Use es to load a user's tasks
  • Display the tasks in a list view
  • Mark one task a completed and update es
  • Refresh the list view

The expectation is that the completed task will not show up. I am
willing to block until all indexes and shards involved have been
refreshed.

I totally get the difficulty in implementing such a feature. Until we
have full real time, the following would be great:

  • I don't care about changes made by other users. It's OK to have a 1
    sec delay before visibility.
  • But for actions taken by the user, I need the next page update or
    navigation to reflect that.
  • So, when a use case requires immediate visibility, the caller will
    provide a list of document uid/version pairs along with the query. The
    query should block until all the specified documents are visible at
    the specified versions (or newer). Then the query executes and
    returns.

In fact, I could easily implement this in my service layer but, it
would be great if this was natively supported.

Btw, with the latest nrt work on lucene, do you expect we will get
full rt anytime soon?

Thanks for an incredible product!


(Curtis Caravone) #9

I opened an issue:

thanks!

On Thu, Jun 23, 2011 at 4:35 AM, Shay Banon shay.banon@elasticsearch.comwrote:

A blocking option, where one sends a bulk request (for example) and it
will block until the changes done in the bulk request are made visible by
the next refresh is possible. Open an issue.

Lucene refresh cost will get lower in the next Lucene version, but not to a
real time level.

On Wednesday, June 22, 2011 at 5:44 AM, Remy Gendron wrote:

We are migrating to es from hibernate search and we intend to use es
not only to support full text searching but, also as our primary mean
of accessing all of our data... Think nosql here. Hence, we are also
indexing the relations between entities so that we can filter on them.

A simple use case made difficult by the NRT nature of es is as follow:

  • Use es to load a user's tasks
  • Display the tasks in a list view
  • Mark one task a completed and update es
  • Refresh the list view

The expectation is that the completed task will not show up. I am
willing to block until all indexes and shards involved have been
refreshed.

I totally get the difficulty in implementing such a feature. Until we
have full real time, the following would be great:

  • I don't care about changes made by other users. It's OK to have a 1
    sec delay before visibility.
  • But for actions taken by the user, I need the next page update or
    navigation to reflect that.
  • So, when a use case requires immediate visibility, the caller will
    provide a list of document uid/version pairs along with the query. The
    query should block until all the specified documents are visible at
    the specified versions (or newer). Then the query executes and
    returns.

In fact, I could easily implement this in my service layer but, it
would be great if this was natively supported.

Btw, with the latest nrt work on lucene, do you expect we will get
full rt anytime soon?

Thanks for an incredible product!


(system) #10