Sorting in nested lists?


#1

Hey guys! (I am using ES version 5.6.4 if this is relevant)

I have entries that can have different "variations" from different sources. They are merged in a specific order before shown to the user. I already had quite a few problems with the way I did this before but now I stumbled upon the "nested"-mapping, which seems to work great! Unless it comes to sorting that is =(

Lets say we have this entry (I only display the source-map here):

{  
    "variations": [
        {
            "owner": "g1",
            "generelData": {
                "forename": "John",
                "surname": "Doe"
            },
            "addresses": {
                "phone": "123456789"
            }
        },
        {
            "owner": "u1",
            "addresses": {
                "phone": "987654321"
            }
        }
    ]
}

Now, if I do the following query:

{
  "query": {
    "match_all": {}
  },
  "sort" : [
      {
         "variations.generelData.forename.keyword" : {
            "nested_path" : "variations"
         }
      }
   ]
}

How does ES now which "variations.generelData.forename.keyword"-field to choose? Does it recognize that there is no such field in the second variation?

But: What would happen if I sort on the "phone"-field. As I said, the order of the merging of duplicate fields is specific: The variation with owner "u1" overrides "g1". But I guess ES doesn't know this :wink:
So: Would there be a way to add an "order" to the sorting? Like:
First sort on the "phone"-field in the variation "u1", if the field doesn't appear there or is null/empty please sort for the "phone"-field in variation "g1" and so on?

Greetings!


(Zachary Tong) #2

Nested sorting has a few extra fields you can specify to help deal with exactly the situation you've run into. Docs here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-sort.html#nested-sorting

From the docs, there's this example:

   "sort" : [
       {
          "offer.price" : {
             "mode" :  "avg",
             "order" : "asc",
             "nested_path" : "offer",
             "nested_filter" : {
                "term" : { "offer.color" : "blue" }
             }
          }
       }
    ]
  • The nested_filter allows you to only sort by documents that match the filter.
  • The mode allows you to combine multiple values that may be derived from matching nested docs, with operations like min/max/average/sum

#3

Thanks for your answer. How complex can this term be?

I can't just filter on "owner": "g1" because when the entry gets merged for User 1, he will see the phone number "987654321" and if it's sorted on the value of the "g1"-variation it would be sorted wrong for this user who has an own variation of the entry.

User 2 on the other hand, would see the phone number "123456789" and so the sorting would be correct for him.

So can I use a script in the term to do the following;

if: u1.phone == exists && u1.phone != null && u1.phone != empty > use u1.phone
else if: g1.phone == exists && g1.phone != null && g1.phone != empty > use g1.phone
....

and so on?


(Zachary Tong) #4

The sort filter can be as complicated as you want, as long as it can be expressed as a filter...but that means it can have compound queries (like a bool query) holding other queries, script query, etc.

Do note that the sort is for the parent root document, not the children nested documents. So at the end of the day it needs to generate a single sort value to represent the whole set of documents.


#5

Thanks for your answer. It wasn't quite what I could use because in this nested-filter I didn't have access to the whole document. But you put me on the right path. I now found out that I need a script to sort.

So... My sort looks like this:

"sort": [
	{
		"_script": {
			"type": "string",
			"script": {
				"source": """
					boolean filter(Map variation, String owner, String category, String fieldName) {
						boolean isOwner = variation.owner.equals(owner);
						
						String field = variation[category][fieldName];
						boolean fieldExists = field != null && field != '';
						
						return isOwner && fieldExists;
					}
            
					String getField(List variations, String owner, String category, String fieldName) {
						Optional optionalVariation = variations.stream()
							.filter(variation -> filter(variation, owner, category, fieldName))
							.findFirst();
							
						if (optionalVariation.isPresent()) {
							Map variation = optionalVariation.get();
							return variation[category][fieldName];
						}
						
						return null;
					}
					
					List variations = params._source.variations;
					List mergeOrder = ['u1', 'g1'];
					String sortValue = '';
					
					for (String owner : mergeOrder) {
						String field = getField(variations, owner, 'generelData', 'phone');
						if (field != null) {
							sortValue = field;
							break;
						}
					}
					
					return sortValue;
				"""
			}
		}
	}
]

It works like a charm! But; Do you think this could run into problems? Like performance during search?


(Zachary Tong) #6

Hard to say, performance will largely depend on how many documents match the query. More docs == more times that script is executed.

Generally, we use a rule of thumb that if the the script starts to look more complicated than a couple of lines there may be a better way to refactor the document to encode the needed data. So the script may work just fine, and painless is pretty fast, but you may end up wanting to encode some of that logic in the document directly so you can sort without a script. Things like loops, etc can make performance unpredictable


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.