Order and paginate by children count

I have been breaking tables with my head for 3 days already. I have that task:

"authors" have "books". I need to search authors who has books satisfying some criteria.
The problem is that I need to sort authors by "how many matching books he has". Exact sorting formula is "number_matching_books^2 / total_books_author_has".
And also I need to paginate because there can be hundreds thousands of resulting authors.

I was solving this task in that way:
type "book" with "author_id" field and facet on "author_id" with "facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted "total" value in a very tricky way that sorting by total gave exactly that results what I need. All was good until I faced with pagination. I tried to do pagination in the same tricky way: in script I was counting "total" field in a way that "offset'ed" results had been pushed into the end...... but all this is so tricky and too hard to implement in production.

Maybe there are another ways to solve the task? Or I should switch off from ElasticSearch...

Just wondering why you don't index author field in books?
I mean that an author won't change once the book is written.

So probably, you can index book with a field author which is an object with fields, name and whatever property you want.

Does it help?

Perhaps, your use case is not really about books… :wink:

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 19 avr. 2013 à 13:38, Stalinko staliniv@gmail.com a écrit :

I have been breaking tables with my head for 3 days already. I have that
task:

"authors" have "books". I need to search authors who has books satisfying
some criteria.
The problem is that I need to sort authors by "how many matching books he
has". Exact sorting formula is "number_matching_books^2 /
total_books_author_has".
And also I need to paginate because there can be hundreds thousands of
resulting authors.

/I was solving this task in that way:/
type "book" with "author_id" field and facet on "author_id" with
"facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted
"total" value in a very tricky way that sorting by total gave exactly that
results what I need. All was good until I faced with pagination. I tried to
do pagination in the same tricky way: in script I was counting "total" field
in a way that "offset'ed" results had been pushed into the end...... but all
this is so tricky and too hard to implement in production.

Maybe there are another ways to solve the task? Or I should switch off
from Elasticsearch...

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Order-and-paginate-by-children-count-tp4033646.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It sounds like what you need is to create a parent/child relationship
between your authors and books, then use a has_child query to search for
matching books, and count each matched book as a score of 1. You'd need to
store the total_books by an author in the author doc itself, in order to
implement your algorithm:

curl -XGET 'http://127.0.0.1:9200/my_index/author/_search?pretty=1' -d '
{
"query" : {
"has_child" : {
"script" : {
"script" : "_score * _score / doc[\u0027total_books\u0027]"
},
"query" : {
"custom_score_query" : {
"query" : {
"constant_score" : {
"query" : {
"match" : {
"book_title" : "the wind in the willows"
}
}
}
}
}
},
"score_mode" : "sum",
"type" : "book"
}
}
}
'

clint

On Fri, Apr 19, 2013 at 1:38 PM, Stalinko staliniv@gmail.com wrote:

I have been breaking tables with my head for 3 days already. I have that
task:

"authors" have "books". I need to search authors who has books satisfying
some criteria.
The problem is that I need to sort authors by "how many matching books he
has". Exact sorting formula is "number_matching_books^2 /
total_books_author_has".
And also I need to paginate because there can be hundreds thousands of
resulting authors.

/I was solving this task in that way:/
type "book" with "author_id" field and facet on "author_id" with
"facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted
"total" value in a very tricky way that sorting by total gave exactly that
results what I need. All was good until I faced with pagination. I tried to
do pagination in the same tricky way: in script I was counting "total"
field
in a way that "offset'ed" results had been pushed into the end...... but
all
this is so tricky and too hard to implement in production.

Maybe there are another ways to solve the task? Or I should switch off
from Elasticsearch...

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Order-and-paginate-by-children-count-tp4033646.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorry, that was incorrectly nested:

curl -XGET 'http://127.0.0.1:9200/my_index/author/_search?pretty=1' -d '
{
"query" : {
"custom_score_query" : {
"script" : {
"script" : "_score * _score / doc[\u0027total_books\u0027]"
},
"query" : {
"has_child" : {
"query" : {
"constant_score" : {
"query" : {
"match" : {
"book_title" : "the wind in the willows"
}
}
}
},
"score_mode" : "sum",
"type" : "book"
}
}
}
}
}
'

On Mon, Apr 22, 2013 at 2:20 PM, Clinton Gormley clint@traveljury.comwrote:

It sounds like what you need is to create a parent/child relationship
between your authors and books, then use a has_child query to search for
matching books, and count each matched book as a score of 1. You'd need to
store the total_books by an author in the author doc itself, in order to
implement your algorithm:

curl -XGET 'http://127.0.0.1:9200/my_index/author/_search?pretty=1' -d '
{
"query" : {
"has_child" : {
"script" : {
"script" : "_score * _score / doc[\u0027total_books\u0027]"
},
"query" : {
"custom_score_query" : {
"query" : {
"constant_score" : {
"query" : {
"match" : {
"book_title" : "the wind in the willows"
}
}
}
}
}
},
"score_mode" : "sum",
"type" : "book"
}
}
}
'

clint

On Fri, Apr 19, 2013 at 1:38 PM, Stalinko staliniv@gmail.com wrote:

I have been breaking tables with my head for 3 days already. I have that
task:

"authors" have "books". I need to search authors who has books satisfying
some criteria.
The problem is that I need to sort authors by "how many matching books he
has". Exact sorting formula is "number_matching_books^2 /
total_books_author_has".
And also I need to paginate because there can be hundreds thousands of
resulting authors.

/I was solving this task in that way:/
type "book" with "author_id" field and facet on "author_id" with
"facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted
"total" value in a very tricky way that sorting by total gave exactly that
results what I need. All was good until I faced with pagination. I tried
to
do pagination in the same tricky way: in script I was counting "total"
field
in a way that "offset'ed" results had been pushed into the end...... but
all
this is so tricky and too hard to implement in production.

Maybe there are another ways to solve the task? Or I should switch off
from Elasticsearch...

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Order-and-paginate-by-children-count-tp4033646.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you Clint! Your solution looks really pretty.
I have solved the task already with "top_children" query and a very monstrous script counting scores. Think I will rewrite it in your way.

Just for lulz, my solution:

http://127.0.0.1:9200/my_index/author/_search
{
"query": {
"top_children": {
"type": "book",
"factor": 10000,
"score": "max",
"query":{
"custom_filters_score": {
"query" : {"match_all" : {}},
"filters" : [{
"filter" : {"term": {"book_title": "Hello world"}},
"script": "
aid = doc['authorId'].value;
cnt = doc['booksCount'].value;
if(found[aid] == null)
found.put(aid, 0);
found[aid] = found[aid] + 1;
found[aid]*found[aid] / cnt
"
}],
"params":{"found": {}, "aid": 0, "cnt" : 0}
}
}
}
}
}

In my script "found" is a map "aid => count of matching books", "aid" and "cnt" are just variables for better code.
With each matching record I increment according "found"s element and return new recalculated score.
In each "book" record I store also "authorId" and "booksCount" the author has.