Range filter with array field


(Maks Materkov) #1

Hi,

I've created many documents like this one:

{
    "a": [1,5,10,15,20,25,30]
}

Now, I want to get all documents, where "at least X elements of a between Y and Z".
So, this example document will match "1 element between 1 and 9" and "2 elements between 1 and 9", but won't match "3 elements between 1 and 9".

I think I should use Range filter first, but what shoud I do next? What filter shoud I use to do this?

Thank you.


(Adrien Grand) #2

Elasticsearch doesn't support running such a query directly. You could potentially do it if you stored every value in a different field, using a bool query and min_should_match. This might however not work well for you depending how many values a document may have at most.


(Maks Materkov) #3

Adrien, thank you for helping.

I'll try to explain the real task a little bit more. We've got a 'users' index with collection of 'user'. Users can perform 'event'. (there are many event types: event1, event2, ...).
The real task is to find users, who perform event A at least B times between time C and D.

In my first attempt, I've tried to save event timestamps as array for each event type (integers are timestamps) at index 'users'.

We already have users index (about 20 millions of users) and about 500 millions of events in our system right now. I don't want to create another index 'events' with 500 millions of records just for this new query.

Time window usually in days, so I thought about data structure something like this for optimization:

[
  {"time_window": "2016-02-01", "event": "event1", "count": 1},
  {"time_window": "2016-02-01", "event": "event2", "count": 5},
  {"time_window": "2016-02-02", "event": "event1", "count": 6},
  {"time_window": "2016-02-02", "event": "event2", "count": 3}
]

Any thoughts or suggestions?

Thank you.


(system) #4