Vector tiles search not returning hits on geo_shape fields

The index mapping being used is equivalent to this one (simplified):

PUT locations
{
  "mappings": {
    "properties": {
      "geoshape": {
        "type": "geo_shape"
      }
    }
  }
}

There is a specific document in this index

POST locations/_bulk?refresh
{ "index": { "_id": "1" } }
{ "geoshape": { "type": "multilinestring", "coordinates": [[[24.7412109375, 59.45624336447568],[23.70849609375,56.64414704199467]], [[12.41455078125,51.358061573190916],[8.76708984375,53.12040528310657]]] } }

If I use the search API with a geo_bounding_box query, I see the result as expected:

POST /locations/_search?_source=false
{
  "query": {
    "geo_bounding_box": {
      "geoshape": {
        "bottom_right": {
          "lat": 35.029996,
          "lon": 36.079102
        },
        "top_left": {
          "lat": 62.714462,
          "lon": -12.436523
        }
      }
    }
  },
  "fields": ["geoshape"]
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "locations",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "fields": {
          "geoshape": [
            {
              "coordinates": [
                [
                  [
                    24.7412109375,
                    59.45624336447568
                  ],
                  [
                    23.70849609375,
                    56.64414704199467
                  ]
                ],
                [
                  [
                    12.41455078125,
                    51.358061573190916
                  ],
                  [
                    8.76708984375,
                    53.12040528310657
                  ]
                ]
              ],
              "type": "MultiLineString"
            }
          ]
        }
      }
    ]
  }
}

When I try and get the same result from the vector tiles search API, it is also returned correctly on the hits layer.

The issue I'm having is that this is working fine for an index created fresh for the testing (on version 7160299), but for an index that was created on version 7150199, the vector tiles API returns no hits, and the geo_shape mapped field is also not returned as part of fields on the search API.

In essence, on the older index I'm seeing this behaviour:

POST /locations/_search?_source=false
{
  "query": {
    "geo_bounding_box": {
      "geoshape": {
        "bottom_right": {
          "lat": 35.029996,
          "lon": 36.079102
        },
        "top_left": {
          "lat": 62.714462,
          "lon": -12.436523
        }
      }
    }
  },
  "fields": ["geoshape"]
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "locations",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0
      }
    ]
  }
}

The field not being returned as part of the search API is sort of expected, based on this comment in the docs:

Due to the complex input structure and index representation of shapes, it is not currently possible to sort shapes or retrieve their fields directly. The geo_shape value is only retrievable through the _source field

However, is it then related that this document does not show up on the vector tiles either? If so, is there a way to remedy this and make the old index behave like the new one, or is the behaviour of the new one not expected and the old one is actually behaving correctly?

I'm hoping to find an answer to this, as I would really like to make use of the vector tiles, but if the features just don't show up, it's not going to be possible.

Hey,

I am not sure I understand everything. I just pinned up a cluster in 7.15, create an index as you described. Then I have done a rolling upgrade to 7.16 and everything works as expected so not sure which step is different to yours.

How did you upgrade from 7.15 to 7.16?

Due to the complex input structure and index representation of shapes, it is not currently possible to sort shapes or retrieve their fields directly. The geo_shape value is only retrievable through the _source field

This only applies to doc values and the fields API retrieves shapes from _source so that is not related. What would happen if you query the failing index with _source enabled?

We run Elastic via ECK, and did the upgrade by simply updating the version number in the cluster definition YAML file - i.e following these guidelines

With source, I get the following responses from the search API.

On the new index:

POST /locations/_search
{
  "query": {
    "geo_bounding_box": {
      "geoshape": {
        "bottom_right": {
          "lat": 35.029996,
          "lon": 36.079102
        },
        "top_left": {
          "lat": 62.714462,
          "lon": -12.436523
        }
      }
    }
  },
  "fields": ["geoshape"],
  "_source": ["geoshape"]
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "locations",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "geoshape": {
            "coordinates": [
                [
                  [
                    24.7412109375,
                    59.45624336447568
                  ],
                  [
                    23.70849609375,
                    56.64414704199467
                  ]
                ],
                [
                  [
                    12.41455078125,
                    51.358061573190916
                  ],
                  [
                    8.76708984375,
                    53.12040528310657
                  ]
                ]
              ],
              "type": "multilinestring"
          }
        },
        "fields": {
          "geoshape": [
            {
              "coordinates": [
                [
                  [
                    24.7412109375,
                    59.45624336447568
                  ],
                  [
                    23.70849609375,
                    56.64414704199467
                  ]
                ],
                [
                  [
                    12.41455078125,
                    51.358061573190916
                  ],
                  [
                    8.76708984375,
                    53.12040528310657
                  ]
                ]
              ],
              "type": "MultiLineString"
            }
          ]
        }
      }
    ]
  }
}

On the old one, with the same request, the response does not contain the fields key in each hit, but does contain _source with the correct value in it.

I do have to admit that I'm omitting some other fields etc. that the old index has, as I can't share the structure/source contents, that's why the request contains only the specific values in fields and _source keys rather than using wildcards or just true for source.

To me it seems like for some reason the new index is able to query on the field (i.e searching by bounding box works fine), but the field cannot be returned as a field, which is more than strange. I did dig around a bit in Elasticsearch code as well, and it would answer the question why the new index surfaces the hits on the vector tile too, as this logic depends on having the field be present in the response (please let me know if I'm barking up the wrong tree here) - elasticsearch/RestVectorTileAction.java at 0699c9351f1439e246d408fd6538deafde4087b6 · elastic/elasticsearch · GitHub

Your assessment is correct, we use the fields API to read geometries and add them to the vector tiles. If the fields API cannot retrieve the geometry, then it will not be added which is what you are describing.

You said there are more fields in the document, could you try to retrieve other fields using the fields API? I want to know if it is a problem with the geo field or is generic to the whole document.

I actually had the same hunch so did that test, other fields seem to work correctly. I don't have a second geo_shape field to test on the old index, but other types seem to work (date, keyword, text, nested), I also have a geo_point field, which also worked.

Thanks for the quick response. Therefore there should be something slightly different in the geometry of the old index that is making parsing of the geometry fail.

Would it be possible to send the output of the query above (with "_source": ["geoshape"]) in the old index?

Here it is, do note that on the old index the field name is different (but that should not matter, right?) - stops_geoshape. I added a fake document (as the coordinates I've been using are fake ones), and put together a search query to only locate that one. Here's the full sequence that I took.

POST /_bulk
{ "index": { "_index": "<old-index-name>", "_id": "fake-id" } }
{ "stops_geoshape": {"coordinates":[[[24.7412109375,59.45624336447568],[23.70849609375,56.64414704199467]],[[12.41455078125,51.358061573190916],[8.76708984375,53.12040528310657]]],"type":"multilinestring" } }

POST /<old-index-name>/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "geo_bounding_box": {
            "stops_geoshape": {
              "bottom_right": {
                "lat": 35.029996,
                "lon": 36.079102
              },
              "top_left": {
                "lat": 62.714462,
                "lon": -12.436523
              }
            }
          }
        },
        {
          "terms": {
            "_id": [
              "fake-id"
            ]
          }
        }
      ]
    }
  },
  "_source": [
    "stops_geoshape"
  ],
  "fields": [
    "stops_geoshape"
  ]
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.0,
    "hits": [
      {
        "_index": "<old-index-name>",
        "_type": "_doc",
        "_id": "fake-id",
        "_score": 0.0,
        "_source": {
          "stops_geoshape": {
            "coordinates": [
              [
                [
                  24.7412109375,
                  59.45624336447568
                ],
                [
                  23.70849609375,
                  56.64414704199467
                ]
              ],
              [
                [
                  12.41455078125,
                  51.358061573190916
                ],
                [
                  8.76708984375,
                  53.12040528310657
                ]
              ]
            ],
            "type": "multilinestring"
          }
        }
      }
    ]
  }
}

// Cleanup of the fake document from the index
POST /_bulk
{ "delete": { "_index": "<old-index-name>", "_id": "fake-id" } }

I have to redact the index name, but hope that's OK.

I am very puzzled, I have tried to reproduce it but so far no luck. Here is what I have done, maybe you can spot some differences.

I have my 7.16.2 cluster:

GET/

{
  "name" : "REDACTED",
  "cluster_name" : "REDACTED",
  "cluster_uuid" : "REDACTED",
  "version" : {
    "number" : "7.16.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "2b937c44140b6559905130a8650c64dbd0879cfb",
    "build_date" : "2021-12-18T19:42:46.604893745Z",
    "build_snapshot" : false,
    "lucene_version" : "8.10.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

And I have created an Index in a previous version:

GET locations/_settings

{
  "locations" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "blocks" : {
          "read_only_allow_delete" : "false"
        },
        "provided_name" : "locations",
        "creation_date" : "1642756291638",
        "number_of_replicas" : "1",
        "uuid" : "iKYBBRe_RgGhFLVOkyDpBw",
        "version" : {
          "created" : "7150199"
        }
      }
    }
  }
}

And the mapping looks like:

GET locations/_mapping

{
  "locations" : {
    "mappings" : {
      "properties" : {
        "stop_geoshape" : {
          "type" : "geo_shape"
        }
      }
    }
  }
}

And everything works as expected:

POST locations/_bulk?refresh
{ "index": { "_id": "1" } }
{ "stop_geoshape": { "type": "multilinestring", "coordinates": [[[24.7412109375, 59.45624336447568],[23.70849609375,56.64414704199467]], [[12.41455078125,51.358061573190916],[8.76708984375,53.12040528310657]]] } }


POST /locations/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "geo_bounding_box": {
            "stop_geoshape": {
              "bottom_right": {
                "lat": 35.029996,
                "lon": 36.079102
              },
              "top_left": {
                "lat": 62.714462,
                "lon": -12.436523
              }
            }
          }
        },
        {
          "terms": {
            "_id": [
              "1"
            ]
          }
        }
      ]
    }
  },
  "_source": [
    "stop_geoshape"
  ],
  "fields": [
    "stop_geoshape"
  ]
}

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "locations",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "stop_geoshape" : {
            "coordinates" : [
              [
                [
                  24.7412109375,
                  59.45624336447568
                ],
                [
                  23.70849609375,
                  56.64414704199467
                ]
              ],
              [
                [
                  12.41455078125,
                  51.358061573190916
                ],
                [
                  8.76708984375,
                  53.12040528310657
                ]
              ]
            ],
            "type" : "multilinestring"
          }
        },
        "fields" : {
          "stop_geoshape" : [
            {
              "coordinates" : [
                [
                  [
                    24.7412109375,
                    59.45624336447568
                  ],
                  [
                    23.70849609375,
                    56.64414704199467
                  ]
                ],
                [
                  [
                    12.41455078125,
                    51.358061573190916
                  ],
                  [
                    8.76708984375,
                    53.12040528310657
                  ]
                ]
              ],
              "type" : "MultiLineString"
            }
          ]
        }
      }
    ]
  }
}

Maybe you have something particular in you mapping?

OK, seems like I have managed to re-create the issue on a brand new index, and it has to do with there also existing a nested field with same prefix.

PUT /locations
{
  "mappings": {
    "properties": {
      "stops": {
        "type": "nested",
        "properties": {
          "stop_id": {
            "type": "keyword"
          }
        }
      },
      "stops_geoshape": {
        "type": "geo_shape"
      }
    }
  },
  "settings": {
    "index": {
      "refresh_interval": "10s",
      "number_of_shards": "10",
      "number_of_replicas": "2"
    }
  }
}

POST /locations/_bulk
{ "index": { "_id": "1" } }
{ "stops_geoshape": {"coordinates":[[[24.7412109375,59.45624336447568],[23.70849609375,56.64414704199467]],[[12.41455078125,51.358061573190916],[8.76708984375,53.12040528310657]]],"type":"multilinestring" } }

POST /locations/_search
{
  "query": {
    "geo_bounding_box": {
      "stops_geoshape": {
        "bottom_right": {
          "lat": 35.029996,
          "lon": 36.079102
        },
        "top_left": {
          "lat": 62.714462,
          "lon": -12.436523
        }
      }
    }
  },
  "fields": ["stops_geoshape"],
  "_source": ["stops_geoshape"]
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "locations",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "stops_geoshape": {
            "coordinates": [
              [
                [
                  24.7412109375,
                  59.45624336447568
                ],
                [
                  23.70849609375,
                  56.64414704199467
                ]
              ],
              [
                [
                  12.41455078125,
                  51.358061573190916
                ],
                [
                  8.76708984375,
                  53.12040528310657
                ]
              ]
            ],
            "type": "multilinestring"
          }
        }
      }
    ]
  }
}

Notice how in the response there is no fields returned. If you then change the mapping during index creation, it works fine:

PUT /locations
{
  "mappings": {
    "properties": {
      "stopss": {
        "type": "nested",
        "properties": {
          "stop_id": {
            "type": "keyword"
          }
        }
      },
      "stops_geoshape": {
        "type": "geo_shape"
      }
    }
  }
}

POST /locations/_bulk
{ "index": { "_id": "1" } }
{ "stops_geoshape": {"coordinates":[[[24.7412109375,59.45624336447568],[23.70849609375,56.64414704199467]],[[12.41455078125,51.358061573190916],[8.76708984375,53.12040528310657]]],"type":"multilinestring" } }

POST /locations/_search
{
  "query": {
    "geo_bounding_box": {
      "stops_geoshape": {
        "bottom_right": {
          "lat": 35.029996,
          "lon": 36.079102
        },
        "top_left": {
          "lat": 62.714462,
          "lon": -12.436523
        }
      }
    }
  },
  "fields": ["stops_geoshape"],
  "_source": ["stops_geoshape"]
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "locations",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "stops_geoshape": {
            "coordinates": [
              [
                [
                  24.7412109375,
                  59.45624336447568
                ],
                [
                  23.70849609375,
                  56.64414704199467
                ]
              ],
              [
                [
                  12.41455078125,
                  51.358061573190916
                ],
                [
                  8.76708984375,
                  53.12040528310657
                ]
              ]
            ],
            "type": "multilinestring"
          }
        },
        "fields": {
          "stops_geoshape": [
            {
              "coordinates": [
                [
                  [
                    24.7412109375,
                    59.45624336447568
                  ],
                  [
                    23.70849609375,
                    56.64414704199467
                  ]
                ],
                [
                  [
                    12.41455078125,
                    51.358061573190916
                  ],
                  [
                    8.76708984375,
                    53.12040528310657
                  ]
                ]
              ],
              "type": "MultiLineString"
            }
          ]
        }
      }
    ]
  }
}

Is this documented somewhere or is this a bug? Not sure what to do as a workaround, create a new field with a different name for the geo_shape?

That you so much for the reproduction! That is a very sneaky bug, would you mind open an issue on the Elasticsearch GitHub repo?

As workaround, changing the name will work. If you don't want to reindex your data what might work is to add a alias field like:

PUT locations/_mapping
{
  "properties": {
      "stops": {
        "type": "nested",
        "properties": {
          "stop_id": {
            "type": "keyword"
          }
        }
      },
      "stops_geoshape": {
        "type": "geo_shape"
      },
      "geoshape": {
        "type": "alias",
        "path": "stops_geoshape" 
      }
    }
}

Then you can query like:

POST /locations/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "geo_bounding_box": {
            "stops_geoshape": {
              "bottom_right": {
                "lat": 35.029996,
                "lon": 36.079102
              },
              "top_left": {
                "lat": 62.714462,
                "lon": -12.436523
              }
            }
          }
        },
        {
          "terms": {
            "_id": [
              "1"
            ]
          }
        }
      ]
    }
  },
  "_source": [
    "stops_geoshape"
  ],
  "fields": [
    "geoshape"
  ]
}

Opened the issue, marking this as resolved Field not returned if it has a same prefix as a nested field · Issue #82905 · elastic/elasticsearch · GitHub

1 Like

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.