[5.0.1][Geolocation] Wrong sorting of elements with search_after clause


(Keyball) #1

Hi together,

Background:
I'm building a list of paginated entries according to the distance to a lat-lon geo point.

Problem:
I have a problem with search_after functionality and the geo location.

First my query:

{
    "_source":[
        "id"
    ],
    "query":{
        "bool":{
            "must":{
                "match_all":{
                }
            },
            "filter":{
                "geo_distance":{
                    "distance":"100.000000km",
                    "geopoint.location":{
                        "lat":50.931788,
                        "lon":6.940588
                    }
                }
            }
        }
    },
    "search_after":[
        0,
        "activity#0000000000001"
    ],
    "sort":[
        {
            "_geo_distance":{
                "geopoint.location":{
                    "lat":50.9317882,
                    "lon":6.9405879
                },
                "order":"asc",
                "unit":"km",
                "mode":"min",
                "distance_type":"sloppy_arc"
            }
        },
        "_uid"
    ],
    "track_scores":true,
    "from":0,
    "size":5

}
  • I've got a result with 5 entries sorted by distance and UID. The last element looks so:

    {
    "_index": "activities",
    "_type": "activity",
    "_id": "0000000000011",
    "_score": 1,
    "_source": {
    "id": 11
    },
    "sort": [
    29.663276081577074,
    "activity#0000000000011"
    ]
    }

  • There is many more elements with the distance of 29.663276081577074km in the query.

After a change into "search_after":[29,"activity#0000000000011"] I should have the elements AFTER the distance of 29 kms and my unique ID (in this case 0000000000011). But the sorting engine used only the first (distance) criteria and the second seemed to be omitted, resulting in the list of elements with UIDs from 0000000000008-0000000000012 - including the search_after criteria of 0000000000011.

Now come the question :slight_smile: I know, I'm a lame in Elasticsearch yet, so I'm asking - it is a bug or an unexpected feature? :slight_smile: - or what can I do with my query to get the results I want? I tried with ranges, other filters and different search_after criteria (e.g. timestamp) but I'd got always the results only sorted by the first criteria (distance).

Best regards,
Luke


(Isabel Drost-Fromm) #2

Thank you for including the query already. Would be great if you could also post some sample documents to make reproducing your issue easier.


(Keyball) #3

Ok, here they go:

  1. "search_after": [ 0, "activity#0000000000001" ] - first page of the search

I search for 5 entries after the distance 0 and UID xxx1:

Results:
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000002",
"_score": 1,
"_source": {
"id": 2
},
"sort": [
8.146888754906811,
"activity#0000000000002"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000008",
"_score": 1,
"_source": {
"id": 8
},
"sort": [
29.663276081577074,
"activity#0000000000008"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000009",
"_score": 1,
"_source": {
"id": 9
},
"sort": [
29.663276081577074,
"activity#0000000000009"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000010",
"_score": 1,
"_source": {
"id": 10
},
"sort": [
29.663276081577074,
"activity#0000000000010"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000011",
"_score": 1,
"_source": {
"id": 11
},
"sort": [
29.663276081577074,
"activity#0000000000011"
]
}
]
}
As you can see, I've got 5 results here, sorted by the distance and UID. The last element has the distance 29.663276081577074 and the UID: activity#0000000000011

  1. "search_after": [ 29, "activity#0000000000011" ]
    I'll use the values from the last query, convert the distance into an integer, as I have many results with the same distance, and set the UID to activity#0000000000011.
    Results:
    "hits": {
    "total": 10,
    "max_score": 1,
    "hits": [
    {
    "_index": "activities",
    "_type": "activity",
    "_id": "0000000000008",
    "_score": 1,
    "_source": {
    "id": 8
    },
    "sort": [
    29.663276081577074,
    "activity#0000000000008"
    ]
    },
    {
    "_index": "activities",
    "_type": "activity",
    "_id": "0000000000009",
    "_score": 1,
    "_source": {
    "id": 9
    },
    "sort": [
    29.663276081577074,
    "activity#0000000000009"
    ]
    },
    {
    "_index": "activities",
    "_type": "activity",
    "_id": "0000000000010",
    "_score": 1,
    "_source": {
    "id": 10
    },
    "sort": [
    29.663276081577074,
    "activity#0000000000010"
    ]
    },
    {
    "_index": "activities",
    "_type": "activity",
    "_id": "0000000000011",
    "_score": 1,
    "_source": {
    "id": 11
    },
    "sort": [
    29.663276081577074,
    "activity#0000000000011"
    ]
    },
    {
    "_index": "activities",
    "_type": "activity",
    "_id": "0000000000012",
    "_score": 1,
    "_source": {
    "id": 12
    },
    "sort": [
    29.663276081577074,
    "activity#0000000000012"
    ]
    }
    ]
    }
    So, I've got the results from xxx8 to xxx12, but I would expect to get entries after the xxx11 - so the xxx12 should be the first element in a new page. The problem exists even if I let Elasticsearch to generate UIDs - so it shouldn't be the name convention I have chosen. I understood the search_after feature should search elements with an internal cursor, so the name convention of the UIDs does not matter here. Or am I wrong?

(Keyball) #4

I used PHP7 + Elasticsearch-PHP 5.0 in my backend, and Sense-Plugin for Chrome for testing purposes - both provided the same results.
Regards,
Luke


(Isabel Drost-Fromm) #5

Would you mind posting to full Sense reproduction (including indexing commands)?


(Keyball) #6

This is the full query I'm using:

POST activities/activity/_search
{
   "_source": [
      "id"
   ],
   "query": {
      "bool": {
         "must": {
            "match_all": {}
         },
         "filter": {
            "geo_distance": {
               "distance": "100.000000km",
               "geopoint.location": {
                  "lat": 50.931788,
                  "lon": 6.940588
               }
            }
         }
      }
   },
   "search_after": [
      29,
      "activity#0000000000011"
   ],
   "sort": [
      {
         "_geo_distance": {
            "geopoint.location": {
               "lat": 50.9317882,
               "lon": 6.9405879
            },
            "order": "asc",
            "unit": "km",
            "mode": "min",
            "distance_type": "sloppy_arc"
         }
      },
      "_uid"
   ],
   "track_scores": true,
   "from": 0,
   "size": 5
}

Should I provide the full response(s) also?


(Keyball) #7

Or should I send the mappings of the index?


(Russ Cam) #8

if you could provide a full reproducible example (mapping, indexing documents, query and results), that would be super helpful; if it's more than can fit on a discuss post, please open a gist and paste the link.


(Keyball) #9

Ok, here is the gist: https://gist.github.com/keyball/80672e965e205752a282d3212c3517ff

I splitted the data into files. You'll get the mapping, the datasource as SQL dump, 2 queries with results and an PHP array I pass to the bulk method of the Elasticsearch-PHP API

Regards,
Luke


(Keyball) #10

Hi, could you read the gist?


(Keyball) #11

Hi there, any news?


(Keyball) #12

Hi, could you tell me if this issue will be processed someday or should I build a workaround?


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.