Issue after migration from geo_shape pre-fix based to geo_shape based bkd tree mapping

Hi

Problem Description
We want to migrate from elastic 7.x to 8. Migrated to 7.17.5 and resolved issues.
One of it was was location field with geo_shape mapping, we used strategy : recursive . To resolve took backup (backup-of-index-surat, this too has explicit mapping , geo_shape with strategy : recursive) and reindexed to the index with explicit mapping (dynamic: false) containing location with just geo_shape mapping without any strategy params. All documents were reindexed (counts matched)

The number of documents returned for geo_shape (circle, bbox, polygon) queries on older mapping and newer mapping for same data are different. Following are two queries :

  1. Linestring query: This w.r.t our data returns around 40K docs on older mapping and in newer, it returns zero docs
GET backup-of-index-surat/_count
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "id": [
              "iisc.ac.in/89a36273d77dac4cf38114fca1bbe64392547f86/rs.iudx.io/surat-itms-realtime-information/surat-itms-live-eta"
            ],
            "boost": 1
          }
        },
        {
          "geo_shape": {
            "location": {
              "shape": {
                "type": "linestring",
                "coordinates": [
                  [
                    72.842,
                    21.2
                  ],
                  [
                    72.923,
                    20.8
                  ]
                ]
              },
              "relation": "intersects"
            }
          }
        },
        {
          "range": {
            "observationDateTime": {
              "from": "2020-10-12T00:00:00.000Z",
              "to": "2020-10-22T00:00:00Z",
              "include_lower": true,
              "include_upper": true,
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}
  1. geo_shape circle query: This returns around 4k docs in older mapping and only 2k in newer mapped index
GET backup-of-index-surat/_count
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "id": [
              "iisc.ac.in/89a36273d77dac4cf38114fca1bbe64392547f86/rs.iudx.io/surat-itms-realtime-information/surat-itms-live-eta"
            ],
            "boost": 1
          }
        },
        {
          "geo_shape": {
            "location": {
              "shape": {
                "radius": "10.0m",
                "type": "Circle",
                "coordinates": [
                  72.834,
                  21.178
                ]
              },
              "relation": "within"
            }
          }
        },
        {
          "range": {
            "observationDateTime": {
              "from": "2020-10-12T00:00:00.000Z",
              "to": "2020-10-22T00:00:00Z",
              "include_lower": true,
              "include_upper": true,
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

What do we expect/Questions

  • Is this reindexing from pre-fix based geo_shape to bkd based geo_shape, the way to migrate the index to 8 version compatible?

  • why does the same geo_shape query return different number of docs for same data drastically , with just change on how its stored in elastic? The precision guaranteed by bkd tree (7 floating point) is higher than the data values stored (6 floating point) ? Is there need to change query to work with newer mapping?

  • The geo_shape queries should return same number of docs after migrating to BKD tree based representation.

Some relevant details:
Elastic version : 7.17.5 running as docker container
Data: All docs in the above indexed dataset are of Point type geo_shape location data. The latitude, longitude data values are of precision 6 floating points.

Please let me know if anything else is needed.

Hi,

I would recommend to read the following blog, in particular the section why we needed a new approach. After that if you have more questions I will be happy to try to answer them:

Thanks!

Thank you @Ignacio_Vera . Went through the post. This answers my first question and would want to migrate to have better performance and size.

So coming back to the core of the issue we faced. We migrated the existing index with geo_shape based on prefix-tree for the location data to BKD-tree based geo_shape mapping using reindex API. This resulted in geo_shape queries on the new index returning a lesser number of docs ( or zero docs in case of linestring query) than what was from the older mapped index. This sort of looks like a bug and not able to understand why geo_shape queries on the two indexes return different counts.

The first post gives examples of two geo_shape queries.

The latitude, and longitude data values are 1e-6 precision, and BKD is capable to store more precisely. The mismatch between search documents returned, should not be because of precision.

All your data are just points? I would recommend to use geo_point field type instead of geo_shape in that case as it is faster and it will use less space.

This is expected as the precision on the old strategy depends on the precision of the grid and sometimes even in the area of the geometry. The new indexing system has a fix error due to the encoding of latitudes / and longitudes.

Having said that, querying points with line strings is tricky, what is the use case? you will normally face precision errors just because of floating point arithmetic.

All your data are just points? I would recommend to use geo_point field type instead of geo_shape in that case as it is faster and it will use less space.

yes, its just points and basically its bus live location data. Thanks you for suggestion, will use geo_point.

A side question: Is there an elastic query to filter points type data, polygon type data?

This is expected as the precision on the old strategy depends on the precision of the grid and sometimes even in the area of the geometry. The new indexing system has a fix error due to the encoding of latitudes / and longitudes.

This make sense. The indexed data in older strategy was less precise (with default 50 m precision). So the queries on this older strategy index gave more count for all geo_shape queries - circle,bbox, and most probably also the reason it gave certain docs for linestring in older mapping.

Having said that, querying points with line strings is tricky, what is the use case? you will normally face precision errors just because of floating point arithmetic.

use case for linestring - to find buses which travel along the same route/path/sub-path.

Having said that, querying points with line strings is tricky, what is the use case?

If you can elaborate a bit more on why its tricky?

In the new indexing strategy a point will only intersects line if it exactly lies on the line which is sensible to double floating point errors. What you really want is to add a buffer on your lines so you can match any point at a distance of a line.

Unfortunately we are not (yet) providing that functionality so you might need to buffer the line at the client, for example transforming the line into a polygon and use the polygon to query your index.

Hope it makes sense.

1 Like

Thanks for the explanation. Did a quick check, and even giving the same two points present in index as coordinates for linestring query, the docs did not match. My understanding, while storing of index, the precision is reduced to 10-7 and while searching it still does with greater precision - double floating point.

Doesn't having both work at same precision levels makes more sense?

This feature would be greatly appreciated.

Doesn't having both work at same precision levels makes more sense?

You need to be in Elasticsearch 8.3+ for that: Quantize geo queries to remove true negatives from search results by iverase · Pull Request #85441 · elastic/elasticsearch · GitHub

This feature would be greatly appreciated.

I agree and it is in the roadmap. There seems to be no issue yet, will try to create one soon.

1 Like

After updating the elasticsearch to 8.3.3, the linestring query between two points returned documents.

Thank you for all the support!