Inconsistency with queries on unknown fields

Hello,

If you make a search request with a geo_bounding_box query, it will fail with a QueryShardException with message failed to find geo field [my-field] in the case where the index doesn't have a mapping for my-field. However, if you try other queries such as terms, it will not fail if you execute it over a field that is not defined in the mapping; it will just return no result.

What should I do to make the geo_bounding_box query behave like the terms query in the case the index is missing a mapping for the field ?

This behaviour happens in Elasticsearch 8.8.

Hi @yfful ,

Can you list the structure of your index?

Thanks

Hello @Alex_Salgado-Elastic ,

This can be seen with a bare bone index, default mappings. For example, create an index with just a single document:

GET test/_search
{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "GeG_34gBVnz3g-FH0pXq",
                "_index": "test",
                "_score": 1.0,
                "_source": {
                    "f1": "aaa"
                }
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 1
        }
    },
    "timed_out": false,
    "took": 66
}

The query with geo_bounding_box fails if the field doesn't exist, but terms doesn't:

GET test/_search
{
  "query": {
    "terms": {
      "f2": [
        "aaa"
      ]
    }
  }
}

GET test/_search
{
  "query": {
    "geo_bounding_box": {
      "f2": {
        "top_left": {
          "lat": 41.722535,
          "lon": -0.0109849999999998
        },
        "bottom_right": {
          "lat": 41.195595000000004,
          "lon": 25.169675
        }
      }
    }
  }
}

Have you tried using the "exists" method? This method checks if a specific field exists in a document before executing the query. Here is an example of how you can adapt your query:

GET test/_search
{
  "query": {
    "bool": {
      "filter": {
        "exists": {
          "field": "f2"
        }
      },
      "must": {
        "geo_bounding_box": {
          "f2": {
            "top_left": {
              "lat": 41.722535,
              "lon": -0.0109849999999998
            },
            "bottom_right": {
              "lat": 41.195595000000004,
              "lon": 25.169675
            }
          }
        }
      }
    }
  }
}

It works but I think it's cumbersome having to put the exists query. There may be other queries that have the same issue with missing fields; we'd have to pair every query with an exists query then.
For example, this can become very complex in bool queries with a should clause. Consider the following with a "natural" expression:

{
  "query": {
    "bool": {
      "should": [
        {
          "geo_bounding_box": {
            "f2": {
              "top_left": {
                "lat": 41.722535,
                "lon": -0.0109849999999998
              },
              "bottom_right": {
                "lat": 41.195595000000004,
                "lon": 25.169675
              }
            }
          }
        },
        {
          "geo_bounding_box": {
            "f3": {
              "top_left": {
                "lat": 41.722535,
                "lon": -0.0109849999999998
              },
              "bottom_right": {
                "lat": 41.195595000000004,
                "lon": 25.169675
              }
            }
          }
        }
      ]
    }
  }
}

With the exists query trick, this must be written as:

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "filter": [
              {
                "geo_bounding_box": {
                  "f3": {
                    "top_left": {
                      "lat": 41.722535,
                      "lon": -0.0109849999999998
                    },
                    "bottom_right": {
                      "lat": 41.195595000000004,
                      "lon": 25.169675
                    }
                  }
                }
              },
              {
                "exists": {
                  "field": "f3"
                }
              }
            ]
          }
        },
        {
          "bool": {
            "filter": [
              {
                "geo_bounding_box": {
                  "f2": {
                    "top_left": {
                      "lat": 41.722535,
                      "lon": -0.0109849999999998
                    },
                    "bottom_right": {
                      "lat": 41.195595000000004,
                      "lon": 25.169675
                    }
                  }
                }
              },
              {
                "exists": {
                  "field": "f2"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

I think the "natural" expression of the should clause is better. The exists query fix works for simple cases, but it doesn't seem to scale to more complex ones.