Mismatch Between Query Result and Index Immediately After Re-Indexing

safakkbilici · January 19, 2024, 8:06am

Hello community,

I have an index and our post-indexing step, we re-index the index and after, the pipeline (written in Java) uses an aggregation query to fetch product attributes (for example categories and their counts), then we cache these values. I noticed after that the values between index and the cache are not same.

I believe that the problem is we try to fetch categories immediately after re-indexing and index may be not ready to be read (meaning, re-indexing may not be in a completely finished state). Some of the answers includes Refresh API with Java:

IndexRequest indexRequest = new IndexRequest(indexAlias);
indexRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);

but my operation is not actually a IndexRequest, it is SearchRequest since we are passing an aggregation query to the index.

How can I know that my index is fully ready to be read its most up-to-date data?

dadoonet · January 19, 2024, 10:29am

In that case, you will need to force a refresh I'm afraid.

github.com

dadoonet/elasticsearch-java-client-demo/blob/main/src/test/java/fr/pilato/test/elasticsearch/hlclient/EsClientIT.java#L277


      
              SearchResponse<Void> response = client.search(sr -> sr.index("test"), Void.class);
              logger.info("response.hits.total.value = {}", response.hits().total().value());
          }
          
          @Test
          void searchData() throws IOException {
              try {
                  client.indices().delete(dir -> dir.index("search-data"));
              } catch (ElasticsearchException ignored) { }
              client.index(ir -> ir.index("search-data").id("1").withJson(new StringReader("{\"foo\":\"bar\"}")));
              client.indices().refresh(rr -> rr.index("search-data"));
              SearchResponse<Void> response = client.search(sr -> sr
                              .index("search-data")
                              .query(q -> q.match(mq -> mq.field("foo").query("bar"))),
                      Void.class);
              logger.info("response.hits.total.value = {}", response.hits().total().value());
              response = client.search(sr -> sr
                              .index("search-data")
                              .query(q -> q.term(tq -> tq.field("foo").value("bar"))),
                      Void.class);
              logger.info("response.hits.total.value = {}", response.hits().total().value());

safakkbilici · January 19, 2024, 10:55am

Does force refresh have side effects? Will I be able to read the most current data this way?

dadoonet · January 19, 2024, 11:39am

Yes.

yes. If you do that very frequently, it will have an impact on the number of segments generated and then the disk IOs.

More details on that at Near real-time search | Elasticsearch Guide [8.14] | Elastic and Refresh API | Elasticsearch Guide [8.14] | Elastic. Highlighting some content:

Refreshes are resource-intensive. To ensure good cluster performance, we recommend waiting for Elasticsearch’s periodic refresh rather than performing an explicit refresh when possible.

If your application workflow indexes documents and then runs a search to retrieve the indexed document, we recommend using the index API's refresh=wait_for query parameter option. This option ensures the indexing operation waits for a periodic refresh before running the search.

safakkbilici · January 19, 2024, 1:03pm

My re-indexing pipeline is a scheduled job and executes once a day. So, for each index, I am going to refresh only once per day, and it is just for an aggregation query to cache categories of documents. Does refreshing this way cause some problems for my users' searches after that?

Btw, I couldn't find refresh=wait_for option in Python. Do you know that can I use that with .indices().refres(index=index)?

Thanks for your patience and answers!

dadoonet · January 19, 2024, 3:05pm

No that looks good to me.

No sorry. I don't know. But some other colleagues here might be able to answer.

RabBit_BR · January 19, 2024, 7:49pm

Do you need refresh after updating some doc it would be like this

    es.index(index=index_name, body=doc, refresh="wait_for")

If you need to force the refresh it would be like this:

   es.indices.refresh(index=index_name)

system · February 16, 2024, 7:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When is data actually indexed in Elasticsearch? Elasticsearch	1	474	July 6, 2017
ElasticSearch refresh_interval Elasticsearch	1	478	July 6, 2017
Slow updates with prepareIndex Java API Elasticsearch	2	639	July 6, 2017
Stale data retuned after index request Elasticsearch	3	1735	July 6, 2017
Read after write consistency (test refresh interval) Elasticsearch	2	229	June 8, 2023

Mismatch Between Query Result and Index Immediately After Re-Indexing

Related topics