Imagine you have an object with a property X.
The initial value of the property is 13.
The object is searchable by a predicate "X > 10".
Someone modifies the property value to 42.
Do we have guarantees that the object will be searchable by the same predicate after the update is sent to Elasticsearch? The used predicate matches both property values.
As far as I know internally the updates are transformed to delete/insert pairs.
Can it be possible that the query with the predicate "X > 10" will return nothing because the old state of the object has been already deleted and the new one isn't yet indexed?
Thank you Mark.
But how refresh could help?
I was talking about query made in parallel by some other client.
That client can get nothing just because the object is modifying, even if both values of X (old and new one) are greater than 10, correct? Can this be somehow solved without global lock?
Wait for refresh is not a global lock. Is it the client waiting for that refresh on the shard it knows the data will exist on to make sure it gets a response.
Ok, writer will wait until the object is searchable.
But reader knows nothing about that and it will read in parallel.
So I need to use global lock to sync the readers and the writer (that also do refresh), right?
Or maybe there is a better solution?
Writer will wait for its changes to be available for searches (specifying the refresh option).
Reader should wait for writer and shouldn't request Elasticsearch until writer completes its work (including waiting until the changes are available for searches). Otherwise it is possible that the request will return nothing, even if the predicate is True for both values (old and new one). Please correct me if I'm wrong.
Reader and writer can be on different machines. So this synchronization requires some kind of distributed lock, right?
That would be great if there is some better alternative.
I mean I need to use an explicit global distributed lock in the client code to synchronize reader and writer to avoid the situation with disappearing objects on reader side. Without synchronization if the reader periodically requests for the object by that predicate it is possible that some of the requests can return nothing and on the next request the object appears again.
If some other property Y is changed will the object disappear from searching for some time?
Lets assume you established a parent/child relationship. A new child is added. Can it hide the parent and/or other children from readers for some time?
If you'd like SLA based response times we will happily put you in touch with someone from our sales team. Otherwise please have patience and we will answer you when we can.
As for me, it's Friday night so hopefully someone else will pop in in the meantime
That's great, thank you Mark!
Do you know if it is planned to change the Elasticsearch behavior regarding the changes of the same object? To return old state until a new one's indexing completes. I believe this is the only step left to make it usable as a real nosql storage (not only as a great searching tool).
What do you mean? The object won't be hidden from searches in the middle of updating the object properties (even if it is searched by modifying property)?
We have ~million documents in the storage. About 10% of them are updated during the day at arbitrary times.
Also there are several systems that search for objects by some predicates (they use scroll to fetch all the objects for which the predicate is true, then produce some reports etc.). They can do it in parallel with updates, the updates and searches are not synchronized.
The question is if it is possible that some of the objects can disappear in the report just because it is currently updating (even if for the old state of the object and for the new one the predicate is true).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.