Sorting by sum of matching children


(Ólafur Gauti Guðmundsson) #1

Hi all,
I have an index with a parent type (let's call this "Equipment"), and a
child type (let's call this "Availability").
An example Equipment document:

{
_id: 1,
title: "Bike",
...
}

An example Availability document:

{
_parent: 1,
day: "2012-04-04",

    available: 2, 

price_a: 100,
price_b: 90

}

The prices for the same equipment can be different each day.

The query I want to perform is basically:

*Find all equipment that is available on day X and day Y, and sort by price.
*

The "finding" part is not hard, I use a set of "has_child" queries to find
equipment documents where availability is larger than zero during the days
in question.
My problem is: how do I sort by the price?

The price for the equipment is the sum of the prices for days X and Y, and
the price for the day is the sum of "price_a" and "price_b" for that day.
Therefore, I need to sort by the sum of a field in selected child documents
(the ones matching the query).

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside the
equipment document?

Thanks a lot!
-OGG


(Frederick Cheung) #2

On Apr 2, 2:46 pm, Ólafur Gauti Guðmundsson oli.ga...@gmail.com
wrote:

My problem is: how do I sort by the price?

The price for the equipment is the sum of the prices for days X and Y, and
the price for the day is the sum of "price_a" and "price_b" for that day.
Therefore, I need to sort by the sum of a field in selected child documents
(the ones matching the query).

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside the
equipment document?

I don't think that nested documents help in terms of your ordering
problem.

You can kind of do this with top_children:

if you have a query that looks something like

top_children: {
type: "availability"
query: {
custom_score: {
query: { query on availability here }
script: "doc['price_a'] + doc['price_b']
}
}
}

Then ES runs the query against availability and returns the parent
documents for the matched children, sorted by the score from the query
agains availability. If an equipment can have multiple matching
availabilities then you can tell ES whether the score should be the
max/average/sum of the children scores.

The fiddly bit is that the query against availability is run with an
expected hit count (you can control this), documents that fall outside
of this hit count won't be used. The returned hit count is also not
always trustworthy. Someone is also working on a has_parent filter
that would probably be useful for this kind of stuff

Fred


(Ólafur Gauti Guðmundsson) #3

Hi Fred,
Thanks a lot for your help, I really appreciate it.
The "top_children" feature is very interesting. However, in the query part,
I must be able to specify that an availability exists for each day in an
interval.

For example, I may be searching for for equipment that is available on
2012-04-10 AND 2012-04-11.

In this case: can I somehow specify in the custom_score query insider
top_children, that two distinct children MUST exists, one matching each day?
That is, I am only interested in equipment that has BOTH of these
availability children:

child 1: (day = 2012-04-10 AND availability > 0)
child 2: (day = 2012-04-11 AND availability > 0)

These children must both exist, and I want the sum of the prices for both
these children to score the parent.
Not sure if I can do that?

Thanks!
-OGG

On Monday, April 2, 2012 3:51:17 PM UTC, Frederick Cheung wrote:

On Apr 2, 2:46 pm, Ólafur Gauti Guðmundsson oli.ga...@gmail.com
wrote:

My problem is: how do I sort by the price?

The price for the equipment is the sum of the prices for days X and Y,
and
the price for the day is the sum of "price_a" and "price_b" for that
day.
Therefore, I need to sort by the sum of a field in selected child
documents
(the ones matching the query).

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside
the
equipment document?

I don't think that nested documents help in terms of your ordering
problem.

You can kind of do this with top_children:

if you have a query that looks something like

top_children: {
type: "availability"
query: {
custom_score: {
query: { query on availability here }
script: "doc['price_a'] + doc['price_b']
}
}
}

Then ES runs the query against availability and returns the parent
documents for the matched children, sorted by the score from the query
agains availability. If an equipment can have multiple matching
availabilities then you can tell ES whether the score should be the
max/average/sum of the children scores.

The fiddly bit is that the query against availability is run with an
expected hit count (you can control this), documents that fall outside
of this hit count won't be used. The returned hit count is also not
always trustworthy. Someone is also working on a has_parent filter
that would probably be useful for this kind of stuff

Fred


(Frederick Cheung) #4

On Apr 2, 7:13 pm, Ólafur Gauti Guðmundsson oli.ga...@gmail.com
wrote:

For example, I may be searching for for equipment that is available on
2012-04-10 AND 2012-04-11.

In this case: can I somehow specify in the custom_score query insider
top_children, that two distinct children MUST exists, one matching each day?
That is, I am only interested in equipment that has BOTH of these
availability children:

child 1: (day = 2012-04-10 AND availability > 0)
child 2: (day = 2012-04-11 AND availability > 0)

These children must both exist, and I want the sum of the prices for both
these children to score the parent.
Not sure if I can do that?

Hmm, not sure about that. Perhaps using the query filter to filter by
each day tht you need?

Or set the score to be 1000000 + price_a + price_b
and set top_children to use the sum of the children scores.
Assuming the sum of all the price_a and price_b the selecting results
with score > 3000000 (use the min_score option) would be the rows with
3 children, and sorting by score still gives you results sorted by
total price.

Fred

On Monday, April 2, 2012 3:51:17 PM UTC, Frederick Cheung wrote:

On Apr 2, 2:46 pm, Ólafur Gauti Guðmundsson oli.ga...@gmail.com
wrote:

My problem is: how do I sort by the price?

The price for the equipment is the sum of the prices for days X and Y,
and
the price for the day is the sum of "price_a" and "price_b" for that
day.
Therefore, I need to sort by the sum of a field in selected child
documents
(the ones matching the query).

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside
the
equipment document?

I don't think that nested documents help in terms of your ordering
problem.

You can kind of do this with top_children:

if you have a query that looks something like

top_children: {
type: "availability"
query: {
custom_score: {
query: { query on availability here }
script: "doc['price_a'] + doc['price_b']
}
}
}

Then ES runs the query against availability and returns the parent
documents for the matched children, sorted by the score from the query
agains availability. If an equipment can have multiple matching
availabilities then you can tell ES whether the score should be the
max/average/sum of the children scores.

The fiddly bit is that the query against availability is run with an
expected hit count (you can control this), documents that fall outside
of this hit count won't be used. The returned hit count is also not
always trustworthy. Someone is also working on a has_parent filter
that would probably be useful for this kind of stuff

Fred


(Ævar Arnfjörð Bjarmason) #5

2012/4/2 Ólafur Gauti Guðmundsson oli.gauti@gmail.com:

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside the
equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.


(Ólafur Gauti Guðmundsson) #6

Hi all,
Thanks for your replies.

@Ævar: Yep, I've considered that. I am exploring a few different options to
solve the problem, after which I'll weigh the pros and cons of each one. I
am indeed using a relational database, but keeping "processed" views of the
data (accumulated availability/capacity pairs, etc.) in ElasticSearch for
speedy searches, and making heavy use of facets.

For now, I'll make do with aggregated prices on the parent document, which
will enable me to do the type of sorting I'm after.

Thanks for your suggestions!

Best regards,
OGG

On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:

2012/4/2 Ólafur Gauti Guðmundsson oli.gauti@gmail.com:

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside
the
equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.

On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:

2012/4/2 Ólafur Gauti Guðmundsson oli.gauti@gmail.com:

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside
the
equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.


(cole) #7

Hi Ólafur,

It sounds like you've already found a solution, but you should be able
to get at what you're after using nested documents as well. I'm not
100% sure I have your use case correct, but I've tried to translate it
into a nested document scheme. The pseudocode below contains a
filtered query with a 'filter' component that filters the result set
down to only those documents with a nested 'availability' document
containing a day you care about and an 'availability' field that is
greater than 0. The 'query' component applies a custom score that is
the sum of the sum of 'price_a' and 'price_b' for each matching day of
availability.

This pseudocode might not be entirely correct (or optimal). If you
have any trouble translating it into real code, let me know and I'll
gist a working example.

query
filtered
query
nested
path: availability
score_mode: total
query
custom_score
script: "doc["price_a"] + doc["price_b"]"
query
terms
day: ["2012-04-10", "2012-04-11"]
filter
and
nested
path: availability
filter
and
term
day: "2012-04-10"
numeric_range
availability
gt: 0
nested
path: availability
filter
and
term
day: "2012-04-11"
numeric_range
availability
gt: 0

On Apr 2, 3:07 pm, Ólafur Gauti Guðmundsson oli.ga...@gmail.com
wrote:

Hi all,
Thanks for your replies.

@Ævar: Yep, I've considered that. I am exploring a few different options to
solve the problem, after which I'll weigh the pros and cons of each one. I
am indeed using a relational database, but keeping "processed" views of the
data (accumulated availability/capacity pairs, etc.) in ElasticSearch for
speedy searches, and making heavy use of facets.

For now, I'll make do with aggregated prices on the parent document, which
will enable me to do the type of sorting I'm after.

Thanks for your suggestions!

Best regards,
OGG

On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:

2012/4/2 Ólafur Gauti Guðmundsson oli.ga...@gmail.com:

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside
the
equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.

On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:

2012/4/2 Ólafur Gauti Guðmundsson oli.ga...@gmail.com:

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside
the
equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.


(Ólafur Gauti Guðmundsson) #8

Hi Cole,
This is super cool, thanks a lot! I can confirm that this nested method
works.

Thanks very much for your help, really appreciate it :slight_smile:

Best regards,
OGG

On Wednesday, April 4, 2012 12:06:39 AM UTC, cole wrote:

Hi Ólafur,

It sounds like you've already found a solution, but you should be able
to get at what you're after using nested documents as well. I'm not
100% sure I have your use case correct, but I've tried to translate it
into a nested document scheme. The pseudocode below contains a
filtered query with a 'filter' component that filters the result set
down to only those documents with a nested 'availability' document
containing a day you care about and an 'availability' field that is
greater than 0. The 'query' component applies a custom score that is
the sum of the sum of 'price_a' and 'price_b' for each matching day of
availability.

This pseudocode might not be entirely correct (or optimal). If you
have any trouble translating it into real code, let me know and I'll
gist a working example.

query
filtered
query
nested
path: availability
score_mode: total
query
custom_score
script: "doc["price_a"] + doc["price_b"]"
query
terms
day: ["2012-04-10", "2012-04-11"]
filter
and
nested
path: availability
filter
and
term
day: "2012-04-10"
numeric_range
availability
gt: 0
nested
path: availability
filter
and
term
day: "2012-04-11"
numeric_range
availability
gt: 0

On Apr 2, 3:07 pm, Ólafur Gauti Guðmundsson oli.ga...@gmail.com
wrote:

Hi all,
Thanks for your replies.

@Ævar: Yep, I've considered that. I am exploring a few different options
to
solve the problem, after which I'll weigh the pros and cons of each one.
I
am indeed using a relational database, but keeping "processed" views of
the
data (accumulated availability/capacity pairs, etc.) in ElasticSearch
for
speedy searches, and making heavy use of facets.

For now, I'll make do with aggregated prices on the parent document,
which
will enable me to do the type of sorting I'm after.

Thanks for your suggestions!

Best regards,
OGG

On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:

2012/4/2 Ólafur Gauti Guðmundsson oli.ga...@gmail.com:

Is there any way of doing this?
Is there an easier way if the availability documents were nested
inside

the

equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.

On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:

2012/4/2 Ólafur Gauti Guðmundsson oli.ga...@gmail.com:

Is there any way of doing this?
Is there an easier way if the availability documents were nested
inside

the

equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.


(system) #9