Inconsistency with queries on unknown fields

Hello,

If you make a search request with a geo_bounding_box query, it will fail with a QueryShardException with message failed to find geo field [my-field] in the case where the index doesn't have a mapping for my-field. However, if you try other queries such as terms, it will not fail if you execute it over a field that is not defined in the mapping; it will just return no result.

What should I do to make the geo_bounding_box query behave like the terms query in the case the index is missing a mapping for the field ?

This behaviour happens in Elasticsearch 8.8.

Hi @yfful ,

Can you list the structure of your index?

Thanks

Hello @Alex_Salgado-Elastic ,

This can be seen with a bare bone index, default mappings. For example, create an index with just a single document:

GET test/_search
{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "GeG_34gBVnz3g-FH0pXq",
                "_index": "test",
                "_score": 1.0,
                "_source": {
                    "f1": "aaa"
                }
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 1
        }
    },
    "timed_out": false,
    "took": 66
}

The query with geo_bounding_box fails if the field doesn't exist, but terms doesn't:

GET test/_search
{
  "query": {
    "terms": {
      "f2": [
        "aaa"
      ]
    }
  }
}

GET test/_search
{
  "query": {
    "geo_bounding_box": {
      "f2": {
        "top_left": {
          "lat": 41.722535,
          "lon": -0.0109849999999998
        },
        "bottom_right": {
          "lat": 41.195595000000004,
          "lon": 25.169675
        }
      }
    }
  }
}

Have you tried using the "exists" method? This method checks if a specific field exists in a document before executing the query. Here is an example of how you can adapt your query:

GET test/_search
{
  "query": {
    "bool": {
      "filter": {
        "exists": {
          "field": "f2"
        }
      },
      "must": {
        "geo_bounding_box": {
          "f2": {
            "top_left": {
              "lat": 41.722535,
              "lon": -0.0109849999999998
            },
            "bottom_right": {
              "lat": 41.195595000000004,
              "lon": 25.169675
            }
          }
        }
      }
    }
  }
}

It works but I think it's cumbersome having to put the exists query. There may be other queries that have the same issue with missing fields; we'd have to pair every query with an exists query then.
For example, this can become very complex in bool queries with a should clause. Consider the following with a "natural" expression:

{
  "query": {
    "bool": {
      "should": [
        {
          "geo_bounding_box": {
            "f2": {
              "top_left": {
                "lat": 41.722535,
                "lon": -0.0109849999999998
              },
              "bottom_right": {
                "lat": 41.195595000000004,
                "lon": 25.169675
              }
            }
          }
        },
        {
          "geo_bounding_box": {
            "f3": {
              "top_left": {
                "lat": 41.722535,
                "lon": -0.0109849999999998
              },
              "bottom_right": {
                "lat": 41.195595000000004,
                "lon": 25.169675
              }
            }
          }
        }
      ]
    }
  }
}

With the exists query trick, this must be written as:

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "filter": [
              {
                "geo_bounding_box": {
                  "f3": {
                    "top_left": {
                      "lat": 41.722535,
                      "lon": -0.0109849999999998
                    },
                    "bottom_right": {
                      "lat": 41.195595000000004,
                      "lon": 25.169675
                    }
                  }
                }
              },
              {
                "exists": {
                  "field": "f3"
                }
              }
            ]
          }
        },
        {
          "bool": {
            "filter": [
              {
                "geo_bounding_box": {
                  "f2": {
                    "top_left": {
                      "lat": 41.722535,
                      "lon": -0.0109849999999998
                    },
                    "bottom_right": {
                      "lat": 41.195595000000004,
                      "lon": 25.169675
                    }
                  }
                }
              },
              {
                "exists": {
                  "field": "f2"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

I think the "natural" expression of the should clause is better. The exists query fix works for simple cases, but it doesn't seem to scale to more complex ones.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.