How to avoid "Phantom Read" problem of nested field

Hi,

I have a rdbms table design which are one to many relation.
Under some consideration,we decide put the relation into elasticsearch index.
I create a index(for parent table) and a nested field for the child table.
Everything works fine until the tough "phantom Read" problem occur as below.

Our requirement is that some field in the nested object need to be unique across documents,
So I need to check the value of the field before adding a new nested object.

Because the verification and insert action are seperate Elasticsearch API,it may cause the "phantom read" due to the other concurrent insertion request.

Following the Optimistic concurrency control,we use the if_seq_no and if_primary_term to avoid the problem.

  1. But the unique field in nested object is across all documents and every document has its own seq_no and primary_term,How do I assure it won't happen "phantom read" when just adding a nested object into one document?

  2. I also notice that the meta field "version" of document Versioning,can I also use version for concurrency control?

  3. Or concurrency control of seq_no and primary_term can be used in distributed elasticsearch node and version field just for one node?

Hey,

always go with seq_no and primary_term independent from the number of nodes. One question for me remains: How do you handle refreshes? It may take up to a second, until a written document is made available for search (check the refresh interval setting for more information), so a search may not return a very recently added document. Are you accounting for this?

--Alex

Hi,spinscale
I use refresh=wait_for.

Let me explain more clear.
I want to make sure the some field is unique of nested object across all documents,so before inserting a new object,I need to check the value of this field of all nested objects.
But there may be another insertion request to elasticsearch of nested object between my checking request and insertion request,So the newly added nested object is not include in my checking scope,that is called phantom read.

For RDBMS,I can start a transaction and modify the isolation level to serializable to avoid phantom read,But I do not know what the appropriate solution is for Elasticsearch.

Thank you!

optimistic concurrency control only works on a per document base, but not across all documents, as you already mentioned. However your would need a transaction isolation against all documents to be changed, which does not work with elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.