Sort by next date in range from array of dates


I'm wondering if the following is possible? I've tried to simplify my issue into a short example:

Given three items, each with an array of dates:

{ "title": "item 1", "dates": [ "2016-04-21T12:00:00.000Z", "2016-04-29T12:00:00.000Z", "2016-05-05T12:00:00.000Z" ] } { "title": "item 2", "dates": [ "2016-04-23T12:00:00.000Z", "2016-04-30T12:00:00.000Z", "2016-05-07T12:00:00.000Z" ] } { "title": "item 3", "dates": [ "2016-04-27T12:00:00.000Z", "2016-04-28T12:00:00.000Z", "2016-05-29T12:00:00.000Z" ] }

I can search for a range of dates:
"query": { "bool": { "must": [ { "range": { "dates": { "from": "2016-04-25T00:00:00.000Z", "to": "2016-04-30T00:00:00.000Z" } } } ] } }

Which correctly returns Item 1 and Item 3. However, within my date range, the next occurring date is actually in Item 3. (Date range is 25/04 to 30/04, Item 3 is on 28/04, Item 1 is on 29/04).

However, adding a sort

"sort": [ { "dates": { "order": "asc" } }

returns Item 1 first, presumably because the first date in that array is earlier.

Is there any way of doing this (without scripting)? Or is there another way of structuring my data that may allow it?

Thanks for any pointers!


Havent tried it out, but you could probably structure your data so that you have "items" as parent objects and "dates" as child objects, then do the range query inside a "has_child" query. Don't know how sorting behaves then, to be honest, but I think the problem why this currently doesn't work is because putting all those dates together in one document.

Hi Geoff,

I had a quick go at my initial idea, came up with something. Its a bit more complicated as I thought, so maybe there's better ways to do it, but at least for the toy example it seems to work.

I've written down the steps in this little gist.

First, if you model your data with a parent-child relation you can separate the items and the dates. When we then do the "range" query inside a "has_child" query we still get get item1 and item3, but now only the child documents where the date is in the range match.

Now, the sorting gets difficult because according to this issue sorting by parent/child is not possible yet. But there is a workaround using function_score_query. Basically we use the date (the internal long value) of the matching child doc as score and then sort by score on the top level.

I'm not sure if this works well in scale and if you can structure your data like this, but it was a fun thing to try. Let me know if this helps.

1 Like

Thanks very much. My data is actually in more of a parent-child relationship anyway (I had simplified it for this example), so I will try this out and report back!