Sorting in nested lists?

cherry-wave · June 26, 2018, 1:21pm

Hey guys! (I am using ES version 5.6.4 if this is relevant)

I have entries that can have different "variations" from different sources. They are merged in a specific order before shown to the user. I already had quite a few problems with the way I did this before but now I stumbled upon the "nested"-mapping, which seems to work great! Unless it comes to sorting that is =(

Lets say we have this entry (I only display the source-map here):

{  
    "variations": [
        {
            "owner": "g1",
            "generelData": {
                "forename": "John",
                "surname": "Doe"
            },
            "addresses": {
                "phone": "123456789"
            }
        },
        {
            "owner": "u1",
            "addresses": {
                "phone": "987654321"
            }
        }
    ]
}

Now, if I do the following query:

{
  "query": {
    "match_all": {}
  },
  "sort" : [
      {
         "variations.generelData.forename.keyword" : {
            "nested_path" : "variations"
         }
      }
   ]
}

How does ES now which "variations.generelData.forename.keyword"-field to choose? Does it recognize that there is no such field in the second variation?

But: What would happen if I sort on the "phone"-field. As I said, the order of the merging of duplicate fields is specific: The variation with owner "u1" overrides "g1". But I guess ES doesn't know this
So: Would there be a way to add an "order" to the sorting? Like:
First sort on the "phone"-field in the variation "u1", if the field doesn't appear there or is null/empty please sort for the "phone"-field in variation "g1" and so on?

Greetings!

polyfractal · June 26, 2018, 7:20pm

Nested sorting has a few extra fields you can specify to help deal with exactly the situation you've run into. Docs here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-sort.html#nested-sorting

From the docs, there's this example:

   "sort" : [
       {
          "offer.price" : {
             "mode" :  "avg",
             "order" : "asc",
             "nested_path" : "offer",
             "nested_filter" : {
                "term" : { "offer.color" : "blue" }
             }
          }
       }
    ]

The nested_filter allows you to only sort by documents that match the filter.
The mode allows you to combine multiple values that may be derived from matching nested docs, with operations like min/max/average/sum

cherry-wave · June 27, 2018, 9:06am

Thanks for your answer. How complex can this term be?

I can't just filter on "owner": "g1" because when the entry gets merged for User 1, he will see the phone number "987654321" and if it's sorted on the value of the "g1"-variation it would be sorted wrong for this user who has an own variation of the entry.

User 2 on the other hand, would see the phone number "123456789" and so the sorting would be correct for him.

So can I use a script in the term to do the following;

if: u1.phone == exists && u1.phone != null && u1.phone != empty > use u1.phone
else if: g1.phone == exists && g1.phone != null && g1.phone != empty > use g1.phone
....

and so on?

polyfractal · June 27, 2018, 5:22pm

The sort filter can be as complicated as you want, as long as it can be expressed as a filter...but that means it can have compound queries (like a bool query) holding other queries, script query, etc.

Do note that the sort is for the parent root document, not the children nested documents. So at the end of the day it needs to generate a single sort value to represent the whole set of documents.

cherry-wave · June 28, 2018, 12:26pm

Thanks for your answer. It wasn't quite what I could use because in this nested-filter I didn't have access to the whole document. But you put me on the right path. I now found out that I need a script to sort.

So... My sort looks like this:

"sort": [
	{
		"_script": {
			"type": "string",
			"script": {
				"source": """
					boolean filter(Map variation, String owner, String category, String fieldName) {
						boolean isOwner = variation.owner.equals(owner);
						
						String field = variation[category][fieldName];
						boolean fieldExists = field != null && field != '';
						
						return isOwner && fieldExists;
					}
            
					String getField(List variations, String owner, String category, String fieldName) {
						Optional optionalVariation = variations.stream()
							.filter(variation -> filter(variation, owner, category, fieldName))
							.findFirst();
							
						if (optionalVariation.isPresent()) {
							Map variation = optionalVariation.get();
							return variation[category][fieldName];
						}
						
						return null;
					}
					
					List variations = params._source.variations;
					List mergeOrder = ['u1', 'g1'];
					String sortValue = '';
					
					for (String owner : mergeOrder) {
						String field = getField(variations, owner, 'generelData', 'phone');
						if (field != null) {
							sortValue = field;
							break;
						}
					}
					
					return sortValue;
				"""
			}
		}
	}
]

It works like a charm! But; Do you think this could run into problems? Like performance during search?

polyfractal · June 28, 2018, 3:37pm

Hard to say, performance will largely depend on how many documents match the query. More docs == more times that script is executed.

Generally, we use a rule of thumb that if the the script starts to look more complicated than a couple of lines there may be a better way to refactor the document to encode the needed data. So the script may work just fine, and painless is pretty fast, but you may end up wanting to encode some of that logic in the document directly so you can sort without a script. Things like loops, etc can make performance unpredictable

system · July 26, 2018, 3:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sorting a nested collection AFTER sort (for display / result purposes only) Elasticsearch	1	393	January 24, 2020
Issue with Nested Sorting in Opensearch Query Elasticsearch	1	87	July 17, 2024
Sort by nested values "across" all documents (duplicate parent documents) Elasticsearch	3	805	February 13, 2020
Sort nested field with two conditions Elasticsearch	1	308	June 25, 2020
Sorting on nested elements Elasticsearch	6	705	July 6, 2017

Sorting in nested lists?

Related topics