Determining when an index operation is complete


(Lucas Ward) #1

I apologize in advance if this is an already answered question. I
couldn't find a reference to it, but you can never tell.

I am testing ElasticSearch from a grails application. Its just a
simple way to test my indexing and searching using the Java API.
Periodically I will also use the GXContentBuilder as well. I am
trying to keep a steady state in my tests. Meaning, I blow out any
existing indices and reindex before running my tests. Not before each
individual test mind you, but before running the 'search integration
tests'.

My problem is this: I can't find a reliable way to determine when the
indexes are ready. At first I thought a cluster health
'waitForYellow' check might work, but it doesn't. The following
search will show no results. Neither will an IndicesStatusRequest
show any documents in the index as well. I also tried a waitForGreen,
which did work. Until I realized that it was just timing out. Since
it's only one node it can't really 'go green'. So, what I ultimately
determined was that doing a Thread.currentThread().sleep(1000) would
do the trick. But its the cardinal sin of integration testing. I
have an awesome workstation with an SSD and new 'Sandy Bridge'
processor, so what seems to work on my machine will inevitably
breakdown when it's run on a CI server that's also running other
builds, etc. But only every once in awhile.

So, what I would really like is some type of blocking call I can make
that will tell me when the index is ready. Or at a minimum, some type
of argument to my IndexRequest that will make it block until the
document is ready to be searched. Of course, that isn't something I
would do in production, as having the call return a future is exactly
what I want on a save/update operation. But it would help a lot in
making some reliable integration tests.

Thanks in advance,

Lucas


(Stephane Maldini) #2

Are you using ElasticSearch plugin ?

On Tue, May 24, 2011 at 6:48 AM, Lucas lucaslward@gmail.com wrote:

I apologize in advance if this is an already answered question. I
couldn't find a reference to it, but you can never tell.

I am testing ElasticSearch from a grails application. Its just a
simple way to test my indexing and searching using the Java API.
Periodically I will also use the GXContentBuilder as well. I am
trying to keep a steady state in my tests. Meaning, I blow out any
existing indices and reindex before running my tests. Not before each
individual test mind you, but before running the 'search integration
tests'.

My problem is this: I can't find a reliable way to determine when the
indexes are ready. At first I thought a cluster health
'waitForYellow' check might work, but it doesn't. The following
search will show no results. Neither will an IndicesStatusRequest
show any documents in the index as well. I also tried a waitForGreen,
which did work. Until I realized that it was just timing out. Since
it's only one node it can't really 'go green'. So, what I ultimately
determined was that doing a Thread.currentThread().sleep(1000) would
do the trick. But its the cardinal sin of integration testing. I
have an awesome workstation with an SSD and new 'Sandy Bridge'
processor, so what seems to work on my machine will inevitably
breakdown when it's run on a CI server that's also running other
builds, etc. But only every once in awhile.

So, what I would really like is some type of blocking call I can make
that will tell me when the index is ready. Or at a minimum, some type
of argument to my IndexRequest that will make it block until the
document is ready to be searched. Of course, that isn't something I
would do in production, as having the call return a future is exactly
what I want on a save/update operation. But it would help a lot in
making some reliable integration tests.

Thanks in advance,

Lucas

--
*
[image: logo_d4w_2010 5k.png]

St├ęphane MALDINI
Consultant
(+33) 6 79 92 67 22
www.doc4web.com
EMC partner - www.emc.com

http://fr.linkedin.com/in/smaldini


(Lucas Ward) #3

I am, but I'm not using it for indexing, and I may remove it
altogether soon. I need more control than I can get from the plugin
over how my indices are created. The application is multi-tenant and
I am creating multiple indices depending upon some factors that can
only be determined at run-time. I've forked the plugin, and even made
some pull requests that have been accepted, and the plugin could
support pluggable strategies for index names, rather than the simple
package name one that exists now. However, it couldn't support it
without a fairly major rewrite of how mappings are applied. Given my
time constraints, it didn't make sense, especially without knowing if
that kind of change would be accepted. I'm also a bit concerned with
some other areas of the plugin, such as the synchronous blocks in the
queue used to write out changes. All of this could be addressed, and
the plugin makes it clear that it's not ready for production yet, but
my timetables are just too aggressive and I think the straight Elastic
Search API is actually quite good. Although, if I was using it, I
would still likely be looking for a way to bypass the integration
testing phase of grails. It takes too long, so it's not very useful
for developers writing tests. I can run just my test of search in 6
or 7 seconds. It takes that long just to resolve ivy dependencies.
(Something they're addressing in 1.4 I believe)

Lucas

On May 24, 12:28 am, Stephane Maldini smald...@doc4web.com wrote:

Are you using ElasticSearch plugin ?

On Tue, May 24, 2011 at 6:48 AM, Lucas lucaslw...@gmail.com wrote:

I apologize in advance if this is an already answered question. I
couldn't find a reference to it, but you can never tell.

I am testing ElasticSearch from a grails application. Its just a
simple way to test my indexing and searching using the Java API.
Periodically I will also use the GXContentBuilder as well. I am
trying to keep a steady state in my tests. Meaning, I blow out any
existing indices and reindex before running my tests. Not before each
individual test mind you, but before running the 'search integration
tests'.

My problem is this: I can't find a reliable way to determine when the
indexes are ready. At first I thought a cluster health
'waitForYellow' check might work, but it doesn't. The following
search will show no results. Neither will an IndicesStatusRequest
show any documents in the index as well. I also tried a waitForGreen,
which did work. Until I realized that it was just timing out. Since
it's only one node it can't really 'go green'. So, what I ultimately
determined was that doing a Thread.currentThread().sleep(1000) would
do the trick. But its the cardinal sin of integration testing. I
have an awesome workstation with an SSD and new 'Sandy Bridge'
processor, so what seems to work on my machine will inevitably
breakdown when it's run on a CI server that's also running other
builds, etc. But only every once in awhile.

So, what I would really like is some type of blocking call I can make
that will tell me when the index is ready. Or at a minimum, some type
of argument to my IndexRequest that will make it block until the
document is ready to be searched. Of course, that isn't something I
would do in production, as having the call return a future is exactly
what I want on a save/update operation. But it would help a lot in
making some reliable integration tests.

Thanks in advance,

Lucas

--
*
[image: logo_d4w_2010 5k.png]

St├ęphane MALDINI
Consultant
(+33) 6 79 92 67 22www.doc4web.com
EMC partner -www.emc.com
--http://fr.linkedin.com/in/smaldini

logo_d4w_2010 5k.png
7KViewDownload


(James Cook) #4

The refresh API call can be issued. When it returns, the index (and any
other requests made prior to the refresh call) are completed.
http://www.elasticsearch.org/guide/reference/api/admin-indices-refresh.html

There is also the possiblity to pass "refresh=true" as a query parameter
when creating the index via REST call. I'm not sure if this is possible
using a Java API call.
http://www.elasticsearch.org/guide/reference/api/index_.html

*Jim Cook
*
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Tue, May 24, 2011 at 12:48 AM, Lucas lucaslward@gmail.com wrote:

I apologize in advance if this is an already answered question. I
couldn't find a reference to it, but you can never tell.

I am testing ElasticSearch from a grails application. Its just a
simple way to test my indexing and searching using the Java API.
Periodically I will also use the GXContentBuilder as well. I am
trying to keep a steady state in my tests. Meaning, I blow out any
existing indices and reindex before running my tests. Not before each
individual test mind you, but before running the 'search integration
tests'.

My problem is this: I can't find a reliable way to determine when the
indexes are ready. At first I thought a cluster health
'waitForYellow' check might work, but it doesn't. The following
search will show no results. Neither will an IndicesStatusRequest
show any documents in the index as well. I also tried a waitForGreen,
which did work. Until I realized that it was just timing out. Since
it's only one node it can't really 'go green'. So, what I ultimately
determined was that doing a Thread.currentThread().sleep(1000) would
do the trick. But its the cardinal sin of integration testing. I
have an awesome workstation with an SSD and new 'Sandy Bridge'
processor, so what seems to work on my machine will inevitably
breakdown when it's run on a CI server that's also running other
builds, etc. But only every once in awhile.

So, what I would really like is some type of blocking call I can make
that will tell me when the index is ready. Or at a minimum, some type
of argument to my IndexRequest that will make it block until the
document is ready to be searched. Of course, that isn't something I
would do in production, as having the call return a future is exactly
what I want on a save/update operation. But it would help a lot in
making some reliable integration tests.

Thanks in advance,

Lucas


(Shay Banon) #5

First, indexing doc is complete once the index API execution returns. When it will be be visible for search thats another question.

By default, there is an ongoing async refreshing going on to make changes done visible for search. It defaults to 1 seconds. You can force a refresh by calling the refresh API. You can also force a refresh by setting the refresh flag to true on hte index request (but, don't use that in production!).

-shay.banon
On Tuesday, May 24, 2011 at 4:58 PM, James Cook wrote:

The refresh API call can be issued. When it returns, the index (and any other requests made prior to the refresh call) are completed.
http://www.elasticsearch.org/guide/reference/api/admin-indices-refresh.html

There is also the possiblity to pass "refresh=true" as a query parameter when creating the index via REST call. I'm not sure if this is possible using a Java API call.
http://www.elasticsearch.org/guide/reference/api/index_.html

Jim Cook
jcook@tracermedia.com

tracermedia interactive
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Tue, May 24, 2011 at 12:48 AM, Lucas lucaslward@gmail.com wrote:

I apologize in advance if this is an already answered question. I
couldn't find a reference to it, but you can never tell.

I am testing ElasticSearch from a grails application. Its just a
simple way to test my indexing and searching using the Java API.
Periodically I will also use the GXContentBuilder as well. I am
trying to keep a steady state in my tests. Meaning, I blow out any
existing indices and reindex before running my tests. Not before each
individual test mind you, but before running the 'search integration
tests'.

My problem is this: I can't find a reliable way to determine when the
indexes are ready. At first I thought a cluster health
'waitForYellow' check might work, but it doesn't. The following
search will show no results. Neither will an IndicesStatusRequest
show any documents in the index as well. I also tried a waitForGreen,
which did work. Until I realized that it was just timing out. Since
it's only one node it can't really 'go green'. So, what I ultimately
determined was that doing a Thread.currentThread().sleep(1000) would
do the trick. But its the cardinal sin of integration testing. I
have an awesome workstation with an SSD and new 'Sandy Bridge'
processor, so what seems to work on my machine will inevitably
breakdown when it's run on a CI server that's also running other
builds, etc. But only every once in awhile.

So, what I would really like is some type of blocking call I can make
that will tell me when the index is ready. Or at a minimum, some type
of argument to my IndexRequest that will make it block until the
document is ready to be searched. Of course, that isn't something I
would do in production, as having the call return a future is exactly
what I want on a save/update operation. But it would help a lot in
making some reliable integration tests.

Thanks in advance,

Lucas


(Lucas Ward) #6

So, after a little back and forth, I was able to figure out the issue:

  1. Issuing a RefreshRequest on an index does what I needed to do, and
    works beautifully. I can create a local node, recreate all my
    indices, load all my test data, and perform a number of searches all
    in under 10 seconds. It makes for fast testing.
  2. While I was having some issues with the index being done, but the
    document not being available for search, I was also getting some odd
    behavior that I thought was related but wasn't. I was creating and
    populating my index without doing a 'put mapping'. It worked fine,
    but only on every other test, and I don't know why. It would return a
    correct number of totalHits, but with no actual hits, and a
    ShardFailure. Once I added the explicit put mapping, the issue went
    away.

Lucas

On May 24, 9:28 am, Shay Banon shay.ba...@elasticsearch.com wrote:

First, indexing doc is complete once the index API execution returns. When it will be be visible for search thats another question.

By default, there is an ongoing async refreshing going on to make changes done visible for search. It defaults to 1 seconds. You can force a refresh by calling the refresh API. You can also force a refresh by setting the refresh flag to true on hte index request (but, don't use that in production!).

-shay.banon

On Tuesday, May 24, 2011 at 4:58 PM, James Cook wrote:

The refresh API call can be issued. When it returns, the index (and any other requests made prior to the refresh call) are completed.
http://www.elasticsearch.org/guide/reference/api/admin-indices-refres...

There is also the possiblity to pass "refresh=true" as a query parameter when creating the index via REST call. I'm not sure if this is possible using a Java API call.
http://www.elasticsearch.org/guide/reference/api/index_.html

Jim Cook
jc...@tracermedia.com

tracermedia interactive
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Tue, May 24, 2011 at 12:48 AM, Lucas lucaslw...@gmail.com wrote:

I apologize in advance if this is an already answered question. I
couldn't find a reference to it, but you can never tell.

I am testing ElasticSearch from a grails application. Its just a
simple way to test my indexing and searching using the Java API.
Periodically I will also use the GXContentBuilder as well. I am
trying to keep a steady state in my tests. Meaning, I blow out any
existing indices and reindex before running my tests. Not before each
individual test mind you, but before running the 'search integration
tests'.

My problem is this: I can't find a reliable way to determine when the
indexes are ready. At first I thought a cluster health
'waitForYellow' check might work, but it doesn't. The following
search will show no results. Neither will an IndicesStatusRequest
show any documents in the index as well. I also tried a waitForGreen,
which did work. Until I realized that it was just timing out. Since
it's only one node it can't really 'go green'. So, what I ultimately
determined was that doing a Thread.currentThread().sleep(1000) would
do the trick. But its the cardinal sin of integration testing. I
have an awesome workstation with an SSD and new 'Sandy Bridge'
processor, so what seems to work on my machine will inevitably
breakdown when it's run on a CI server that's also running other
builds, etc. But only every once in awhile.

So, what I would really like is some type of blocking call I can make
that will tell me when the index is ready. Or at a minimum, some type
of argument to my IndexRequest that will make it block until the
document is ready to be searched. Of course, that isn't something I
would do in production, as having the call return a future is exactly
what I want on a save/update operation. But it would help a lot in
making some reliable integration tests.

Thanks in advance,

Lucas


(system) #7