Because shadow replicas do not index the document on replica shards, it’s
possible for the replica’s known mapping to be behind the index’s known mapping
if the latest cluster state has not yet been processed on the node containing
the replica. Because of this, it is highly recommended to use pre-defined
mappings when using shadow replicas.
I've got two questions:
What is the maximum amount of time a replica can be behind the latest mapping (1 refresh interval, 60s, 1 day?)
What is the implication of a replica mapping being behind the latest mapping? Will queries including that field work as if it wasn't there, will a fatal error be thrown, etc?
it's usually a cluster-state update behind. It should be a matter of seconds though. This problem is less or not relevant in 2.0 (2.0.0-beta1 just got released) since we wait on the primary for the cluster-state to be published on the replicas.
the point of shadow replicas is that you don't use them necessarily for realtime search. Today you need a shared FS to use them at all and changes are only visible once you _flush your data to disk. That said the mapping updates should make it to the replica by the time they are searched. If you refresh all the time and flush all the time shadow replicas is not the right tool for you. If the field is not there I think the query will just return less docs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.