There is many more elements with the distance of 29.663276081577074km in the query.
After a change into "search_after":[29,"activity#0000000000011"] I should have the elements AFTER the distance of 29 kms and my unique ID (in this case 0000000000011). But the sorting engine used only the first (distance) criteria and the second seemed to be omitted, resulting in the list of elements with UIDs from 0000000000008-0000000000012 - including the search_after criteria of 0000000000011.
Now come the question I know, I'm a lame in Elasticsearch yet, so I'm asking - it is a bug or an unexpected feature? - or what can I do with my query to get the results I want? I tried with ranges, other filters and different search_after criteria (e.g. timestamp) but I'd got always the results only sorted by the first criteria (distance).
"search_after": [ 0, "activity#0000000000001" ] - first page of the search
I search for 5 entries after the distance 0 and UID xxx1:
Results:
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000002",
"_score": 1,
"_source": {
"id": 2
},
"sort": [
8.146888754906811,
"activity#0000000000002"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000008",
"_score": 1,
"_source": {
"id": 8
},
"sort": [
29.663276081577074,
"activity#0000000000008"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000009",
"_score": 1,
"_source": {
"id": 9
},
"sort": [
29.663276081577074,
"activity#0000000000009"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000010",
"_score": 1,
"_source": {
"id": 10
},
"sort": [
29.663276081577074,
"activity#0000000000010"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000011",
"_score": 1,
"_source": {
"id": 11
},
"sort": [
29.663276081577074,
"activity#0000000000011"
]
}
]
}
As you can see, I've got 5 results here, sorted by the distance and UID. The last element has the distance 29.663276081577074 and the UID: activity#0000000000011
"search_after": [ 29, "activity#0000000000011" ]
I'll use the values from the last query, convert the distance into an integer, as I have many results with the same distance, and set the UID to activity#0000000000011.
Results:
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000008",
"_score": 1,
"_source": {
"id": 8
},
"sort": [
29.663276081577074,
"activity#0000000000008"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000009",
"_score": 1,
"_source": {
"id": 9
},
"sort": [
29.663276081577074,
"activity#0000000000009"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000010",
"_score": 1,
"_source": {
"id": 10
},
"sort": [
29.663276081577074,
"activity#0000000000010"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000011",
"_score": 1,
"_source": {
"id": 11
},
"sort": [
29.663276081577074,
"activity#0000000000011"
]
},
{
"_index": "activities",
"_type": "activity",
"_id": "0000000000012",
"_score": 1,
"_source": {
"id": 12
},
"sort": [
29.663276081577074,
"activity#0000000000012"
]
}
]
}
So, I've got the results from xxx8 to xxx12, but I would expect to get entries after the xxx11 - so the xxx12 should be the first element in a new page. The problem exists even if I let Elasticsearch to generate UIDs - so it shouldn't be the name convention I have chosen. I understood the search_after feature should search elements with an internal cursor, so the name convention of the UIDs does not matter here. Or am I wrong?
if you could provide a full reproducible example (mapping, indexing documents, query and results), that would be super helpful; if it's more than can fit on a discuss post, please open a gist and paste the link.
I splitted the data into files. You'll get the mapping, the datasource as SQL dump, 2 queries with results and an PHP array I pass to the bulk method of the Elasticsearch-PHP API
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.