Atomic Query Dump

Hi All,

A little background first. I have a system where I have a set of rolling
indexes.

For example
Index 22
Index 23, etc.

There is an alias set up that points to the most recent, complete index.

current -> 22.

After the importer finishes importing the new version of the index, it
updates the alias and deletes the older versions.

When doing queries against elasticsearch, I work against the current alias
and if it 404's, I simply repeat the operation because of an alias
delete/create race.

This is all working ok for us so far.

Now, I want to do one additional thing. I want to be able to take the
search queries being issued by the user that are paged, and use those to do
a complete, atomic dump of the documents in the query. Basically the same
thing as running the query with size=Infinity. I want to do this, so I
don't miss or duplicate any entries in the result set. I'm ok if it takes a
long time. This does not seem to work though. If I do the search with
size=250000000 and search_type=query_and_fetch it doesn't work. It just
sits there doing nothing as far as I can tell. Is there a mode similar to
this I could use that would?

The only other option I can see is the scroll api. I'm not sure that is
safe to use with the current alias/delete index rolling that I am doing. Is
it safe? If you have an open scroll_id, what happens when you try and
delete the real index? Does the scroll_id go invalid? Does it try and point
to the new data instead of the old resulting in badness?

Thanks,
Kevin

--

If you delete an index the following scroll query will fail
with SearchContextMissingException exceptions for each shard. This is as
"dangerous" as calling scroll with an expired scrollid. Basically, the
scroll query keeps special data structure (SearchContext) representing the
current state of search in memory. These SearchContexts are identified by
scroll ids. When index is deleted, all SearchContexts are removed as well,
and scroll request against a deleted index fails because it cannot find
its SearchContexts. In other words its perfectly safe. You just need to add
retry logic for SearchContextMissingException that would start the scroll
operation all over again.

On Friday, November 9, 2012 1:36:58 PM UTC-5, Kevin Fox wrote:

Hi All,

A little background first. I have a system where I have a set of rolling
indexes.

For example
Index 22
Index 23, etc.

There is an alias set up that points to the most recent, complete index.

current -> 22.

After the importer finishes importing the new version of the index, it
updates the alias and deletes the older versions.

When doing queries against elasticsearch, I work against the current alias
and if it 404's, I simply repeat the operation because of an alias
delete/create race.

This is all working ok for us so far.

Now, I want to do one additional thing. I want to be able to take the
search queries being issued by the user that are paged, and use those to do
a complete, atomic dump of the documents in the query. Basically the same
thing as running the query with size=Infinity. I want to do this, so I
don't miss or duplicate any entries in the result set. I'm ok if it takes a
long time. This does not seem to work though. If I do the search with
size=250000000 and search_type=query_and_fetch it doesn't work. It just
sits there doing nothing as far as I can tell. Is there a mode similar to
this I could use that would?

The only other option I can see is the scroll api. I'm not sure that is
safe to use with the current alias/delete index rolling that I am doing. Is
it safe? If you have an open scroll_id, what happens when you try and
delete the real index? Does the scroll_id go invalid? Does it try and point
to the new data instead of the old resulting in badness?

Thanks,
Kevin

--

Ok. I think I can make it work then. Thank you for your help.

Kevin

On Friday, November 9, 2012 5:22:11 PM UTC-8, Igor Motov wrote:

If you delete an index the following scroll query will fail
with SearchContextMissingException exceptions for each shard. This is as
"dangerous" as calling scroll with an expired scrollid. Basically, the
scroll query keeps special data structure (SearchContext) representing the
current state of search in memory. These SearchContexts are identified by
scroll ids. When index is deleted, all SearchContexts are removed as well,
and scroll request against a deleted index fails because it cannot find
its SearchContexts. In other words its perfectly safe. You just need to add
retry logic for SearchContextMissingException that would start the scroll
operation all over again.

On Friday, November 9, 2012 1:36:58 PM UTC-5, Kevin Fox wrote:

Hi All,

A little background first. I have a system where I have a set of rolling
indexes.

For example
Index 22
Index 23, etc.

There is an alias set up that points to the most recent, complete index.

current -> 22.

After the importer finishes importing the new version of the index, it
updates the alias and deletes the older versions.

When doing queries against elasticsearch, I work against the current
alias and if it 404's, I simply repeat the operation because of an alias
delete/create race.

This is all working ok for us so far.

Now, I want to do one additional thing. I want to be able to take the
search queries being issued by the user that are paged, and use those to do
a complete, atomic dump of the documents in the query. Basically the same
thing as running the query with size=Infinity. I want to do this, so I
don't miss or duplicate any entries in the result set. I'm ok if it takes a
long time. This does not seem to work though. If I do the search with
size=250000000 and search_type=query_and_fetch it doesn't work. It just
sits there doing nothing as far as I can tell. Is there a mode similar to
this I could use that would?

The only other option I can see is the scroll api. I'm not sure that is
safe to use with the current alias/delete index rolling that I am doing. Is
it safe? If you have an open scroll_id, what happens when you try and
delete the real index? Does the scroll_id go invalid? Does it try and point
to the new data instead of the old resulting in badness?

Thanks,
Kevin

--