I had a problem in production that seems related to a 206 partial response being returned when I restarted one of my node for maintenance.
I'm trying to reproduce this but I had no luck for now. Is there any procedure that I can follow to reproduce a partial shard fail on an index with only 1 primary and 1 replica.
I don't think Elasticsearch ever returns the HTTP status code 206 Partial Content, so I suspect this code came from something else in your environment. "Partial shard fail" isn't a thing, shards can only completely fail.
But I'm sure that ES responded with a subset of document since the process doing the request is doing a diff between a PostgreSQL and ES and it detected that documents were missing in ES.
This option is not set to false on my cluster settings or on queries so I assume that ES could in extreme circumstances respond with a partial answer.
Am I right ? And If I am, how could I replicate this ?
I've tried to restart data nodes multiple times but I wasn't able to reproduce this, even if I restart 2 of the 3 nodes at the same time I get a "hard" failure but not a partial response (which is expected).
Yes, that makes more sense. It'd still be a 200 OK but may not contain results from all shards.
I imagine a well-timed node shutdown or network partition would do it, but you might need more nodes and/or zero replicas to increase your chances of success.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.