Can I trust a PIT id once it has been replaced by a new PIT id?

Snav · April 23, 2021, 8:47am

Hello there,

I was wondering how the Point In Time works internally and if it is trustable for parallelization.

Context

Given this implementation:

I have a search request to retrieve a list of offers. It may return hundreds of thousands documents.
One 'Dispatcher' will paginate X elements and loop over the results using the PIT and search_after parameters.
The Dispatcher will create chunks of ids from the response and will produce RabbitMQ messages with the current PIT id and the document ids.
RabbitMQ consumers will consume each message and call ES with their PIT id and the document ids to retrieve the data they need from the view.

When the Dispatcher loop, it can get a new PIT id from ES while the consumers may consume messages with an "outdated" PIT id.

As stated by the documentation: "The open point in time request and each subsequent search request can return different id; thus always use the most recently received id for the next search request."

The question is, what happens to the previous view based on the PIT id that has been replaced?

Can my consumers still search using their PIT id, for a short time, to get the results from the view or is it unsafe / not consistent / unpredictable?

I guess not and will have a look to sliced scrolls but if someone has more insights, it would be welcomed.

Thank you

mayya · April 23, 2021, 7:30pm

If during a search request some shard used in the original PIT is not available (e.g. went offline), elasticsearch will attempt to use a different copy of this shard if there is one available and has the same commit history. Currently even if we retry with different shards, we will still return the same PIT, but in the future we may change this.

Answering your question – yes you can trust a new PIT id or shard replacement, because we make sure to use a shard for replacement that has the same commit history.

Snav · April 24, 2021, 1:20pm

Hey Mayya,

Thanks for the quick response and the details but I am not sure it answers my problem, or I may have missed something.

If I may add an example:

My first search returns the PIT Id AAAAA
I send an immutable message to my queue with this PIT Id for later use (job parallelization)
My second search with the PIT Id AAAAA returns a new PIT Id ZZZZZ, for the reason you mentioned
20s later, the consumer of my queue runs a search on ES with the PIT Id AAAAA

So my question is can I still rely on the result of the query with PIT Id AAAAA despite the PIT Id having been superseded by ZZZZZ ~20 seconds ago?

mayya · April 26, 2021, 12:28pm

Thanks for more details.

The answer to your question is yes, that the guarantee of PIT – a certain point of time in the index, regardless of what copies of shards are being used internally. If we can't find this point of time any more, an error of "no search context found" will be returned.

But as I said in the previous answer, currently we always return the same PIT id even if we end up using different shard copies.

Snav · April 27, 2021, 3:39pm

Okay, if I get it correctly, the PIT Id points to the context of a specific PIT. If it does not exists anymore, an error is returned. Until then I can still trust both snapshots to hold the same data.

Thanks.

system · May 25, 2021, 3:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Clarification needed: Changing PIT IDs in Elasticsearch Elasticsearch	1	36	August 1, 2024
Should I close the older PIT id once it has been replaced by a new PIT id? Elasticsearch point-in-time	1	396	February 1, 2022
Point In Time for multi threaded queries / slices Elasticsearch point-in-time	1	313	November 7, 2022
Point in time snapshot (PIT id), original query required when paging? Elasticsearch	6	852	December 31, 2021
Is point in time ID considered sensitive information? Elasticsearch	1	146	November 15, 2023

Can I trust a PIT id once it has been replaced by a new PIT id?

Related topics