Different responses for search with terms query and get document by id

jprucia · February 19, 2026, 12:20pm

Here is the scenario: app is running an integration test and is spinning up docker container using official docker image, version 8.7.0. When it’s up nd running, test doest its setup (it is using Spring Data Elasticsearch library, but I’ve anbled request tracing to see what exactly gets sent to ES). It creates index with custom mappings (all fields used later in search are defined as keywords), then goes on with a test case, which involves creating two documents, updating them and making sure they are actually updated. Both creation and updating of documents goes as expected, meaning app issues /index/_bulk?refresh=false requests which produce 200 OK response. When creating index we use index.refresh_interval with value of 1s, so its expected, that right after bulk request, the data won’t be visible untill refresh happens. So we do a pooling with Awaitility, getting document by its id and checking its fields to see if it was updated.

The problem is that when getting the document by id (GET /$index/_doc/$id), at some point it is evident, that refresh took place, since we see all expected changes withing the document. When that happens we do a search (POST /$index/_search?search_type=query_then_fetch&typed_keys=true) with query that looks like this:

```
{
  "aggregations":{
    "totalSum":{"sum":{"field":"anountField"}}
  },
  "from":0,
  "query":{
    "bool":{
      "must":[
        {"terms":{"idField":["2","1"]}},
        {"term":{"unchangedField":{"value":"false"}}}
      ]
    }
  }
  ,"size":10
  ,"sort":[
    {"sortField1":{"mode":"max","order":"desc"}},
    {"sortField2":{"mode":"max","order":"desc"}}
  ],
  "track_scores":false,
  "track_total_hits":2147483647,
  "version":true
}
```

The search request returns expected documents (since we asked for them by ids), however surprisingly their contents slightly different from values we see when getting them individually by ids. To further expand on “slightly”: update changes two fields, giving each new value. When we do get document by id, both fields have expected value. When we do search document has field1 with expected value and field2 with unexpected (namely: previous) value.

When we do the most crude approach and after issuing those bulk requests we do simply Thread.sleep(1000), then everything works just fine. I still feel like this is not optimal approach. I’m aware of wait_for_refresh, but it’s not really usefull in my case. So the question is: why does this difference happen? is it expected to happen? Perhaps it is something wrong with my query?

RainTown · February 19, 2026, 1:08pm

Welcome to the forum @jprucia !!

For a first time poster, you’ve written an excellent post. Very clear and precise.

I think you have simply misunderstood what “refresh=” means here. refresh=false pretty much means the bulk call should return once the docs are indexed, but you do NOT need those specific documents to be returned by any search yet. If you do need those updates to be returned by a search right away, then don’t use refresh=false, and be prepared to wait a bit longer.

You will always get the most recent version of a document by simply looking up its _id.

But a search is not a lookup.

The index refeesh_interval is related but not the same. It’s not a guarantee. Put simply, refresh_interval of 1s does NOT mean an index will always refresh every second. Closer would be “please try and refresh active shards every second, using your best efforts”.

You have effectively described the expected behaviour.

This is erroneous. You can’t know refresh took place via this test.

jprucia · February 23, 2026, 2:42pm

Thank you for your reply.

I think my understanding of refresh parameter was correct. Also my understanding of refresh interval most probably was correct (I kinda assumed in some cases ES might come a little late).

What I did not understand correctly was the fact that “lookup is not search”.

After learning this i ditched the lookup approach and replaced it with a simple search. Basically a single term query fetching document with given id, no aggregates, no sorting and other fancy stuff. When done that way and supplemented with Awaitility waiting at least 5 seconds it all works as expected. Assumption here beeing that if I index 2 documents, and then I wait untill one of that documents has proper data when searching, then surely index has been refreshed. I really hope this expectation is ok, at least for simple integration tests

Thank you again as I think you clarified very important point, that wasn’t clear for me.

RainTown · February 24, 2026, 1:11am

if I’ve parsed this correctly:

then the assumption seems not 100% safe. Background refresh is done at shard, not index, level.

So if you index doc1 then doc2, and doc2 shows that recent update in a subsequent search, then the specific shard containing doc2 has been refreshed. But it guarantees you nothing about doc1, as doc1 is maybe on a different shard.

In passing, “proper data” is a curious construction that I’d personally suggest to avoid due to some ambiguity. From context, I presume you mean “most recent data”.

jprucia · March 17, 2026, 9:28am

My assumption is probably 100% safe, because of the environment I’m running in. As I’ve mentioned at the begining: we are talking about a single docker container running official ES image. The only code talking to this particular node is an integration test, which is a single thread doing just a bunch of requests in a sequential manner.

I’m aware, that this is a special setup and I would not expect such behaviour from live deployment with multiple shards and so on.

That being said I was still able to fix my problem. Many thanks for your answers!

RainTown · March 17, 2026, 12:42pm

The current approach depends on an implementation detail (single shard) rather than a guarantee. IMHO that makes it fragile by definition.

It also creates a knowledge trap:—there’s nothing in the system itself preventing someone from changing the shard count and introducing subtle, hard-to-diagnose bugs.

Also, “probably 100% safe” is a contradiction. Either something is guaranteed by design, or it’s relying on assumptions. In this case, it’s clearly the latter.

My cars brakes work decent, well unless I drive faster than 60. After that I know they’re a bit dodgy, but I’m 100% safe because I’ll never drive faster than 60 !?

Topic		Replies	Views
The id query will trigger refresh, but I still can't search with query qsl? Elasticsearch	12	655	September 17, 2019
Can get document by ID but not find it in query all? Elasticsearch	21	5907	July 6, 2017
Checking if updated documents are visible to search Elasticsearch	6	73	December 3, 2025
Question about default refresh_interval and upsert Elasticsearch	7	1354	January 1, 2021
Refresh + child documents Elasticsearch	7	386	July 6, 2017

Different responses for search with terms query and get document by id

Related topics