I've been having a problem where refreshing an index at the wrong point
seems to prevent results being returned. I've managed to cut down my
problem to the following sequence:
Should produce 1 result: I'm indexing a document (garment), adding a child
document (verdict) and then querying for child documents that have a parent.
If I remove the first call to refresh then the search does return the
expected document and everything behaves normally. If I leave that refresh
in place then I get no results until I do something like
You can also be safe when you create all the mappings first, then creating the docs.
Ok. What I'd really like to understand is what is causing this, so I can decide whether this is just something affecting the setup on my unit tests (in which case doing something like changing the order in which all those steps happen is fine). If this could happen in production then I need a finer handle in what I need to avoid, because as it is this feels like a bug that I could accidentally trigger
The semantics of "_refresh" is to execute a Lucene "maybeRefresh" with a
force attribute. This might also affect ES ID cache access that is not part
of the Lucene refresh operation.
There is an extra clear ID cache API beside refresh API, so that ES ID
cache can be cleared by another API request. For convenience, it could be
more comprehensible to include a clear ID cache operation into each refresh
API request. On the other hand, invalidating caches can be expensive, so
there are two API calls for good reason. So, it is a kind of interpretation
what can be expected from a refresh API call. I would call it a glitch, and
it seems specific to parent/child.
My rule of thumb: if you have a parent/child query on docs you created
between different API forced refresh calls, call the clear ID cache API, so
for the parent/child query, a valid ID cache will get populated again.
Maybe you should open an issue to get this tracked by the ES team, because
the current behavior is not corresponding to a "least surprise" principle.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.