Atomic Query Dump

Kevin_Fox_2 · November 9, 2012, 6:36pm

Hi All,

A little background first. I have a system where I have a set of rolling
indexes.

For example
Index 22
Index 23, etc.

There is an alias set up that points to the most recent, complete index.

current -> 22.

After the importer finishes importing the new version of the index, it
updates the alias and deletes the older versions.

When doing queries against elasticsearch, I work against the current alias
and if it 404's, I simply repeat the operation because of an alias
delete/create race.

This is all working ok for us so far.

Now, I want to do one additional thing. I want to be able to take the
search queries being issued by the user that are paged, and use those to do
a complete, atomic dump of the documents in the query. Basically the same
thing as running the query with size=Infinity. I want to do this, so I
don't miss or duplicate any entries in the result set. I'm ok if it takes a
long time. This does not seem to work though. If I do the search with
size=250000000 and search_type=query_and_fetch it doesn't work. It just
sits there doing nothing as far as I can tell. Is there a mode similar to
this I could use that would?

The only other option I can see is the scroll api. I'm not sure that is
safe to use with the current alias/delete index rolling that I am doing. Is
it safe? If you have an open scroll_id, what happens when you try and
delete the real index? Does the scroll_id go invalid? Does it try and point
to the new data instead of the old resulting in badness?

Thanks,
Kevin

--

Igor_Motov · November 10, 2012, 1:22am

If you delete an index the following scroll query will fail
with SearchContextMissingException exceptions for each shard. This is as
"dangerous" as calling scroll with an expired scrollid. Basically, the
scroll query keeps special data structure (SearchContext) representing the
current state of search in memory. These SearchContexts are identified by
scroll ids. When index is deleted, all SearchContexts are removed as well,
and scroll request against a deleted index fails because it cannot find
its SearchContexts. In other words its perfectly safe. You just need to add
retry logic for SearchContextMissingException that would start the scroll
operation all over again.

On Friday, November 9, 2012 1:36:58 PM UTC-5, Kevin Fox wrote:

Hi All,

A little background first. I have a system where I have a set of rolling
indexes.

For example
Index 22
Index 23, etc.

There is an alias set up that points to the most recent, complete index.

current -> 22.

After the importer finishes importing the new version of the index, it
updates the alias and deletes the older versions.

When doing queries against elasticsearch, I work against the current alias
and if it 404's, I simply repeat the operation because of an alias
delete/create race.

This is all working ok for us so far.

Now, I want to do one additional thing. I want to be able to take the
search queries being issued by the user that are paged, and use those to do
a complete, atomic dump of the documents in the query. Basically the same
thing as running the query with size=Infinity. I want to do this, so I
don't miss or duplicate any entries in the result set. I'm ok if it takes a
long time. This does not seem to work though. If I do the search with
size=250000000 and search_type=query_and_fetch it doesn't work. It just
sits there doing nothing as far as I can tell. Is there a mode similar to
this I could use that would?

The only other option I can see is the scroll api. I'm not sure that is
safe to use with the current alias/delete index rolling that I am doing. Is
it safe? If you have an open scroll_id, what happens when you try and
delete the real index? Does the scroll_id go invalid? Does it try and point
to the new data instead of the old resulting in badness?

Thanks,
Kevin

--

Kevin_Fox_2 · November 13, 2012, 12:24am

Ok. I think I can make it work then. Thank you for your help.

Kevin

On Friday, November 9, 2012 5:22:11 PM UTC-8, Igor Motov wrote:

If you delete an index the following scroll query will fail
with SearchContextMissingException exceptions for each shard. This is as
"dangerous" as calling scroll with an expired scrollid. Basically, the
scroll query keeps special data structure (SearchContext) representing the
current state of search in memory. These SearchContexts are identified by
scroll ids. When index is deleted, all SearchContexts are removed as well,
and scroll request against a deleted index fails because it cannot find
its SearchContexts. In other words its perfectly safe. You just need to add
retry logic for SearchContextMissingException that would start the scroll
operation all over again.

On Friday, November 9, 2012 1:36:58 PM UTC-5, Kevin Fox wrote:

Hi All,

A little background first. I have a system where I have a set of rolling
indexes.

For example
Index 22
Index 23, etc.

There is an alias set up that points to the most recent, complete index.

current -> 22.

After the importer finishes importing the new version of the index, it
updates the alias and deletes the older versions.

When doing queries against elasticsearch, I work against the current
alias and if it 404's, I simply repeat the operation because of an alias
delete/create race.

This is all working ok for us so far.

Now, I want to do one additional thing. I want to be able to take the
search queries being issued by the user that are paged, and use those to do
a complete, atomic dump of the documents in the query. Basically the same
thing as running the query with size=Infinity. I want to do this, so I
don't miss or duplicate any entries in the result set. I'm ok if it takes a
long time. This does not seem to work though. If I do the search with
size=250000000 and search_type=query_and_fetch it doesn't work. It just
sits there doing nothing as far as I can tell. Is there a mode similar to
this I could use that would?

The only other option I can see is the scroll api. I'm not sure that is
safe to use with the current alias/delete index rolling that I am doing. Is
it safe? If you have an open scroll_id, what happens when you try and
delete the real index? Does the scroll_id go invalid? Does it try and point
to the new data instead of the old resulting in badness?

Thanks,
Kevin

--

Topic		Replies	Views
A Bit deep-dive question about Elasticsearch Internal (Scroll) Elasticsearch	1	494	November 29, 2017
API behaviors on index alias which points to several indices Elasticsearch ilm-index-lifecycle-management	2	653	July 25, 2020
ElasticSearch indexes got corrupt after using Scroll API Elasticsearch	1	397	May 22, 2018
How much overhead for scroll search_type? Elasticsearch	7	998	July 6, 2017
Simultaneous scroll requests over same index Elasticsearch	4	925	July 5, 2017

Atomic Query Dump

Related topics