FIlter by array? sort by number of items in array

Andrey_Kaprov · August 15, 2017, 8:37am

HI all! There is one problem that I have met:
When I use filters when querying the elasticsearch I need to sort result documents by count of querried filters.
I.e. I have some documents with array field "category":

[1,2,4]
[1,2,5]
[1]

When I need to filter by category 1 I expect to see documents order: 3, 1, 2
When I filter by categories 1 and 2 I expect: 1, 2, 3 (3 documtn is for category 1)

polyfractal · August 18, 2017, 3:36pm

The filtering/matching part should be taken care of by regular filtering, but for the sorting behavior I think you may have to resort to a scripted sort: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_script_based_sorting

The behavior is relatively complicated: sort by number of matching categories, and break ties by sorting category ascending (presumably that's why the order is 3,1,2 on the first example). I don't think anything in ES can do that for you natively, but you should be able to work up the correct logic in a script and use that.

Andrey_Kaprov · August 18, 2017, 6:07pm

Thank you for reply! I think that script-based sorting would be a good choice.
I already though about the algorithm, it could be something like that:
ABS(doc.categories.length - request.categories.intercept(doc.categories).length).
But there is some trouble:

Sort scripts will be executed for every filtered document in the search index (2 million and above)
ES has an limitation of dynamic scripts compilations (15/min by default) I don't know how often those sorted data can be queried

polyfractal · August 18, 2017, 8:37pm

The script will be invoked for every matching document (e.g. the docs that match the query and filters). So if all 2m docs match, then yes... the script will be invoked 2m times. Painless is pretty fast so I wouldn't be toooo concerned, but it is something to keep in mind. It's the price that has to be paid for extreme flexibility unfortunately.

This only applies to unique compilations. If you parameterize the script correctly so that all dynamically changing parts are provided in the params object, the script will only be compiled once. You can invoke it as many times as you like... it's just the compilation that is rate-limited.

There may be a way to approximate your algorithm if you index the size of the category array alongside the array itself. Then you could try to work out a sort value from the query score and the size. Not sure, haven't thought about it too deeply but thought I'd mention. I still think script is likely your best option

system · September 15, 2017, 8:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter Script with array Elasticsearch	4	456	August 19, 2021
Sort by value of array object Elasticsearch	1	336	July 6, 2017
I want to sort items by the array of numbers Elasticsearch	2	373	June 19, 2023
Sort the Results by the Result of a Script Elasticsearch	5	2480	December 22, 2017
Ordering results by terms filter Elasticsearch	2	355	July 5, 2017

FIlter by array? sort by number of items in array

Related topics