Guaranteed upper bound for near real time search

vishrut_goyal1 · January 2, 2015, 8:17am

Hello,

Although real time searches are not possible in Elasticsearch, but "near
real time" are possible by setting "refresh_interval" to "1s" (1 second).
The problem is that even after setting refresh interval to 1 second, it's
not "guaranteed" that an indexed document will be available for search
after 1 second. My load tests indicate that sometimes even after 3 seconds
of indexing a document, it is not available for search. As per the
elasticsearch documentation, "refresh_interval" controls "how often the
refresh operation will be executed". It does not provide an upper bound on
the delay between indexing a document and that document being available for
search.

Is there any other setting in Elasticsearch that can guarantee such bound?

Thanks,
Vishrut

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · January 2, 2015, 8:45am

Do you compute this 3s delay between when you send the document and the search request? Or between the index response from elasticsearch and the search request?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 2 janv. 2015 à 09:17, vishrut.goyal@gmail.com a écrit :

Hello,

Although real time searches are not possible in Elasticsearch, but "near real time" are possible by setting "refresh_interval" to "1s" (1 second).
The problem is that even after setting refresh interval to 1 second, it's not "guaranteed" that an indexed document will be available for search after 1 second. My load tests indicate that sometimes even after 3 seconds of indexing a document, it is not available for search. As per the elasticsearch documentation, "refresh_interval" controls "how often the refresh operation will be executed". It does not provide an upper bound on the delay between indexing a document and that document being available for search.

Is there any other setting in Elasticsearch that can guarantee such bound?

Thanks,
Vishrut

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8BF8E59E-2233-4BD4-B89E-C5457185832A%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

jprante · January 2, 2015, 8:51am

First, it seems you confuse "being available for search" with "near real
time". These are two different things:

search after indexing is expected to take a long time because of the
unpredictable overhead in worst case scenarios (e.g. creating index,
creating mapping, creating document, creating segments, creating replica on
other nodes, segment merge etc.)
near real time: getting a doc ID after indexing is very fast because the
document is immediately available in a special Lucene segment kept in RAM
(in the millisecond range)

You can experiment and reduce refresh interval to 50ms and exercise
Elasticsearch (term) query operation. On RAM-only clusters, you will get
best results, but that has nothing to do with the (near) real time feature
of Elasticsearch get operation.

Also note the complexity of distributed systems. As long as there is no
information about the workload distribution and no priority index queues
are used, no upper time bound (deadlines) can be set in distributed
indexing.

If you want to find out about a faster real time switch between index write
and read, you may have interest in using

http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/ControlledRealTimeReopenThread.html

but you have to use your own custom code, because Elasticsearch does not
make use of ControlledRealTimeReopenThread.

Jörg

On Fri, Jan 2, 2015 at 9:17 AM, vishrut.goyal@gmail.com wrote:

Hello,

Although real time searches are not possible in Elasticsearch, but "near
real time" are possible by setting "refresh_interval" to "1s" (1 second).
The problem is that even after setting refresh interval to 1 second, it's
not "guaranteed" that an indexed document will be available for search
after 1 second. My load tests indicate that sometimes even after 3 seconds
of indexing a document, it is not available for search. As per the
elasticsearch documentation, "refresh_interval" controls "how often the
refresh operation will be executed". It does not provide an upper bound on
the delay between indexing a document and that document being available for
search.

Is there any other setting in Elasticsearch that can guarantee such bound?

Thanks,
Vishrut

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGj98SJgMt9NHVVKNKp0_2NdOt_QSsTHLvdN6ZcrkZ%2BaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

vishrut_goyal1 · January 2, 2015, 9:01am

The 3 second delay is between the index response and the search request. I
am blocking on indexing operation till I get the response, and then
scheduling the search request 3 seconds after I get the response.

Thanks,
Vishrut

On Friday, January 2, 2015 2:15:28 PM UTC+5:30, David Pilato wrote:

Do you compute this 3s delay between when you send the document and the
search request? Or between the index response from elasticsearch and the
search request?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 2 janv. 2015 à 09:17, vishru...@gmail.com <javascript:> a écrit :

Hello,

Although real time searches are not possible in Elasticsearch, but "near
real time" are possible by setting "refresh_interval" to "1s" (1 second).
The problem is that even after setting refresh interval to 1 second, it's
not "guaranteed" that an indexed document will be available for search
after 1 second. My load tests indicate that sometimes even after 3 seconds
of indexing a document, it is not available for search. As per the
elasticsearch documentation, "refresh_interval" controls "how often the
refresh operation will be executed". It does not provide an upper bound on
the delay between indexing a document and that document being available for
search.

Is there any other setting in Elasticsearch that can guarantee such bound?

Thanks,
Vishrut

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d8c2142-46f4-441e-b4f9-2af22f40721d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vishrut_goyal1 · January 2, 2015, 9:09am

As per Elasticsearch documentation, "Get" operation is completely real time
(unless explicitly disabled). We can immediately get the document using the
doc ID immediately after indexing the document.
I am talking of Search operation here, which can be made "near realtime" by
controlling the value of "refresh_interval".

Thanks,
Vishrut

On Friday, January 2, 2015 2:22:03 PM UTC+5:30, Jörg Prante wrote:

First, it seems you confuse "being available for search" with "near real
time". These are two different things:

search after indexing is expected to take a long time because of the
unpredictable overhead in worst case scenarios (e.g. creating index,
creating mapping, creating document, creating segments, creating replica on
other nodes, segment merge etc.)

near real time: getting a doc ID after indexing is very fast because
the document is immediately available in a special Lucene segment kept in
RAM (in the millisecond range)

You can experiment and reduce refresh interval to 50ms and exercise
Elasticsearch (term) query operation. On RAM-only clusters, you will get
best results, but that has nothing to do with the (near) real time feature
of Elasticsearch get operation.

Also note the complexity of distributed systems. As long as there is no
information about the workload distribution and no priority index queues
are used, no upper time bound (deadlines) can be set in distributed
indexing.

If you want to find out about a faster real time switch between index
write and read, you may have interest in using

ControlledRealTimeReopenThread (Lucene 4.10.3 API)

but you have to use your own custom code, because Elasticsearch does not
make use of ControlledRealTimeReopenThread.

Jörg

On Fri, Jan 2, 2015 at 9:17 AM, <vishru...@gmail.com <javascript:>> wrote:

Hello,

Although real time searches are not possible in Elasticsearch, but "near
real time" are possible by setting "refresh_interval" to "1s" (1 second).
The problem is that even after setting refresh interval to 1 second, it's
not "guaranteed" that an indexed document will be available for search
after 1 second. My load tests indicate that sometimes even after 3 seconds
of indexing a document, it is not available for search. As per the
elasticsearch documentation, "refresh_interval" controls "how often the
refresh operation will be executed". It does not provide an upper bound on
the delay between indexing a document and that document being available for
search.

Is there any other setting in Elasticsearch that can guarantee such bound?

Thanks,
Vishrut

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/550c4629-1a82-4940-9acb-f00f094f22ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mikemccand · January 2, 2015, 9:55am

The 1s refresh_interval means that ES will open (takes some time) and warm
(takes some more time) a new NRT reader, and after that reader is done
opening, 1s later it will open again.

So it's possible in your case it takes 2s to open + warm a new NRT reader
(check the node's logs). But 2s is quite a long time for the reopen unless
the index has changed a lot (which is unlikely with 1s refresh_interval).

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 2, 2015 at 4:09 AM, vishrut.goyal@gmail.com wrote:

As per Elasticsearch documentation, "Get" operation is completely real
time (unless explicitly disabled). We can immediately get the document
using the doc ID immediately after indexing the document.
I am talking of Search operation here, which can be made "near realtime"
by controlling the value of "refresh_interval".

Thanks,
Vishrut

On Friday, January 2, 2015 2:22:03 PM UTC+5:30, Jörg Prante wrote:

First, it seems you confuse "being available for search" with "near real
time". These are two different things:

search after indexing is expected to take a long time because of the
unpredictable overhead in worst case scenarios (e.g. creating index,
creating mapping, creating document, creating segments, creating replica on
other nodes, segment merge etc.)

near real time: getting a doc ID after indexing is very fast because
the document is immediately available in a special Lucene segment kept in
RAM (in the millisecond range)

You can experiment and reduce refresh interval to 50ms and exercise
Elasticsearch (term) query operation. On RAM-only clusters, you will get
best results, but that has nothing to do with the (near) real time feature
of Elasticsearch get operation.

Also note the complexity of distributed systems. As long as there is no
information about the workload distribution and no priority index queues
are used, no upper time bound (deadlines) can be set in distributed
indexing.

If you want to find out about a faster real time switch between index
write and read, you may have interest in using

Index of /__root/docs.lucene.apache.org/core/4_10_3/core/org/apache/lucene/search
ControlledRealTimeReopenThread.html

but you have to use your own custom code, because Elasticsearch does not
make use of ControlledRealTimeReopenThread.

Jörg

On Fri, Jan 2, 2015 at 9:17 AM, vishru...@gmail.com wrote:

Hello,

Although real time searches are not possible in Elasticsearch, but "near
real time" are possible by setting "refresh_interval" to "1s" (1 second).
The problem is that even after setting refresh interval to 1 second,
it's not "guaranteed" that an indexed document will be available for search
after 1 second. My load tests indicate that sometimes even after 3 seconds
of indexing a document, it is not available for search. As per the
elasticsearch documentation, "refresh_interval" controls "how often the
refresh operation will be executed". It does not provide an upper bound on
the delay between indexing a document and that document being available for
search.

Is there any other setting in Elasticsearch that can guarantee such
bound?

Thanks,
Vishrut

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/347d1f45-c243-4c87-b4ae-e02eee039b13%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/550c4629-1a82-4940-9acb-f00f094f22ab%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/550c4629-1a82-4940-9acb-f00f094f22ab%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReu8CeymcXEDpY80QLdKat2Rg-k6%3DtKAOtvFwasEYye2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Diego_de_Freitas · November 21, 2016, 10:40am

Do you know if it is possible to monitor this overhead over the second ?

Thanks

Topic		Replies	Views
Search time interval Elasticsearch	2	264	July 6, 2017
Read after write consistency (test refresh interval) Elasticsearch	2	248	June 8, 2023
Refresh interval guarantee Elasticsearch	3	556	December 12, 2021
When is data actually indexed in Elasticsearch? Elasticsearch	1	476	July 6, 2017
ES 5.2 refresh_interval doesn't work if set to 0 Elasticsearch	4	1843	April 18, 2017

Guaranteed upper bound for near real time search

Thanks, Vishrut

Related topics

Thanks,
Vishrut