Are objects searchable after updates?

Imagine you have an object with a property X.
The initial value of the property is 13.
The object is searchable by a predicate "X > 10".
Someone modifies the property value to 42.
Do we have guarantees that the object will be searchable by the same predicate after the update is sent to Elasticsearch? The used predicate matches both property values.

As far as I know internally the updates are transformed to delete/insert pairs.
Can it be possible that the query with the predicate "X > 10" will return nothing because the old state of the object has been already deleted and the new one isn't yet indexed?

Yes it is possible.

You may want to look at ?refresh | Elasticsearch Reference [5.5] | Elastic

Thank you Mark.
But how refresh could help?
I was talking about query made in parallel by some other client.
That client can get nothing just because the object is modifying, even if both values of X (old and new one) are greater than 10, correct? Can this be somehow solved without global lock?

A refresh makes sure the document is searchable.

Wait for refresh is not a global lock. Is it the client waiting for that refresh on the shard it knows the data will exist on to make sure it gets a response.

Ok, writer will wait until the object is searchable.
But reader knows nothing about that and it will read in parallel.
So I need to use global lock to sync the readers and the writer (that also do refresh), right?
Or maybe there is a better solution?

Writer will wait for its changes to be available for searches (specifying the refresh option).
Reader should wait for writer and shouldn't request Elasticsearch until writer completes its work (including waiting until the changes are available for searches). Otherwise it is possible that the request will return nothing, even if the predicate is True for both values (old and new one). Please correct me if I'm wrong.

Reader and writer can be on different machines. So this synchronization requires some kind of distributed lock, right?
That would be great if there is some better alternative.

It's not a global lock, there is no such thing in Elasticsearch.

Refreshes happen, by default, every second.

I mean I need to use an explicit global distributed lock in the client code to synchronize reader and writer to avoid the situation with disappearing objects on reader side. Without synchronization if the reader periodically requests for the object by that predicate it is possible that some of the requests can return nothing and on the next request the object appears again.

That would be the best if Elasticsearch returned an old state of the object until it is refreshed and new state is indexed.

A few more questions:

If some other property Y is changed will the object disappear from searching for some time?

Lets assume you established a parent/child relationship. A new child is added. Can it hide the parent and/or other children from readers for some time?

Guys, could you please answer. This is important for our project.

If you'd like SLA based response times we will happily put you in touch with someone from our sales team. Otherwise please have patience and we will answer you when we can.

As for me, it's Friday night so hopefully someone else will pop in in the meantime :slight_smile:

2 Likes

No it won't.

1 Like

That's great, thank you Mark!
Do you know if it is planned to change the Elasticsearch behavior regarding the changes of the same object? To return old state until a new one's indexing completes. I believe this is the only step left to make it usable as a real nosql storage (not only as a great searching tool).

That's the only thing it can do.

What do you mean? The object won't be hidden from searches in the middle of updating the object properties (even if it is searched by modifying property)?

The only time that can happen is if you make a request to a shard that is applying the change at the exact same time.

The chances of that, while not null, aren't that likely.

We have ~million documents in the storage. About 10% of them are updated during the day at arbitrary times.
Also there are several systems that search for objects by some predicates (they use scroll to fetch all the objects for which the predicate is true, then produce some reports etc.). They can do it in parallel with updates, the updates and searches are not synchronized.
The question is if it is possible that some of the objects can disappear in the report just because it is currently updating (even if for the old state of the object and for the new one the predicate is true).

It's a possibility.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.