FYI having left elastic I’m not working on the PR any more. It was a potentially big change and there was reluctance because this might represent the start of a very slippery slope of adding all kinds of validation (arrays, enums, acceptable numeric value ranges etc).
In many businesses elasticsearch gets fired at with all kinds of data streams that evolve constantly but still need to be captured. To cope with that leniency is more valued than strictness.
The lesson I take from proposing this change is there’s a policy line drawn where elasticsearch mappings attempt to perform validation declaratively Vs using application-specific code.
While I agree that mappings shouldn’t validate everything I think the vast majority of content stored in elasticsearch consists of single-valued fields and there should be a simple option to validate that on the way in. (It would have to be opt-in because the default has always supported multi value and too many things would break if elasticsearch changed that).
yes, to me there is a big gap between validation of the data content and data type itself.
If I can compare to any computing language, Array is a data type, and must be explicitely declared as is. All programmers know that. And this difference must be part of the data schema (here the mapping).
Then data content validation is another story.
Mixing array and concrete values are fine in most elasticsearch usage (search, aggregation ...) but it fails when using Apache Spark, which is a bug because we use the official elastic Spark "driver". (because spark is receiving string then string => kaboom)
I forget exactly where but there are already aspects to elasticsearch (data streams? Transforms?) where content is grouped or routed by a choice of field which absolutely has to be single-valued and not an array.