Mismatch Between Query Result and Index Immediately After Re-Indexing

Hello community,

I have an index and our post-indexing step, we re-index the index and after, the pipeline (written in Java) uses an aggregation query to fetch product attributes (for example categories and their counts), then we cache these values. I noticed after that the values between index and the cache are not same.

I believe that the problem is we try to fetch categories immediately after re-indexing and index may be not ready to be read (meaning, re-indexing may not be in a completely finished state). Some of the answers includes Refresh API with Java:

IndexRequest indexRequest = new IndexRequest(indexAlias);
indexRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);

but my operation is not actually a IndexRequest, it is SearchRequest since we are passing an aggregation query to the index.

How can I know that my index is fully ready to be read its most up-to-date data?

In that case, you will need to force a refresh I'm afraid.

Does force refresh have side effects? Will I be able to read the most current data this way?

Yes.

yes. If you do that very frequently, it will have an impact on the number of segments generated and then the disk IOs.

More details on that at Near real-time search | Elasticsearch Guide [8.14] | Elastic and Refresh API | Elasticsearch Guide [8.14] | Elastic. Highlighting some content:

Refreshes are resource-intensive. To ensure good cluster performance, we recommend waiting for Elasticsearch’s periodic refresh rather than performing an explicit refresh when possible.

If your application workflow indexes documents and then runs a search to retrieve the indexed document, we recommend using the index API's refresh=wait_for query parameter option. This option ensures the indexing operation waits for a periodic refresh before running the search.

My re-indexing pipeline is a scheduled job and executes once a day. So, for each index, I am going to refresh only once per day, and it is just for an aggregation query to cache categories of documents. Does refreshing this way cause some problems for my users' searches after that?

Btw, I couldn't find refresh=wait_for option in Python. Do you know that can I use that with .indices().refres(index=index)?

Thanks for your patience and answers!

No that looks good to me.

No sorry. I don't know. But some other colleagues here might be able to answer. :wink:

Do you need refresh after updating some doc it would be like this

    es.index(index=index_name, body=doc, refresh="wait_for")

If you need to force the refresh it would be like this:

   es.indices.refresh(index=index_name)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.