Hi ES Community,
I recently found some confusing behavior with a mapping and multi-fields, and it would be nice if Elasticsearch would prevent you from shooting yourself in the foot (like I did).
The issue arises when you use a position_increment_gap
in a multi-field inconsistently. Apparently, the position_increment_gap
on an outer field is not automatically applied to the inner fields. So, in the example mapping below, the inner unstemmed
field will have a different position_increment_gap
. When you index multi-value docs on this multi-field, the term vectors for the two fields are inconsistent, and span_near
queries will not work across the fields.
"properties": {
"text": {
"analyzer": "main_analyzer",
"type": "text",
"position_increment_gap": 1000,
"fields": {
"unstemmed": {
"analyzer": "unstemmed_analyzer",
"type": "text",
}
}
}
}
This type of mistake could have been avoided if one of the following was fixed:
- Apply the
position_increment_gap
of a field to any inner multi-fields (note: this may break mappings). - Require all multi-fields that use a
position_increment_gap
to explicitly set it on each field. - Emit some sort of warning somewhere.
- At least acknowledge this gotcha in the docs on
position_increment_gap
and/or multi-fields.
Thanks for listening!