I have been breaking tables with my head for 3 days already. I have that task:
"authors" have "books". I need to search authors who has books satisfying some criteria.
The problem is that I need to sort authors by "how many matching books he has". Exact sorting formula is "number_matching_books^2 / total_books_author_has".
And also I need to paginate because there can be hundreds thousands of resulting authors.
I was solving this task in that way:
type "book" with "author_id" field and facet on "author_id" with "facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted "total" value in a very tricky way that sorting by total gave exactly that results what I need. All was good until I faced with pagination. I tried to do pagination in the same tricky way: in script I was counting "total" field in a way that "offset'ed" results had been pushed into the end...... but all this is so tricky and too hard to implement in production.
Maybe there are another ways to solve the task? Or I should switch off from ElasticSearch...
I have been breaking tables with my head for 3 days already. I have that task:
"authors" have "books". I need to search authors who has books satisfying
some criteria.
The problem is that I need to sort authors by "how many matching books he
has". Exact sorting formula is "number_matching_books^2 /
total_books_author_has".
And also I need to paginate because there can be hundreds thousands of
resulting authors.
/I was solving this task in that way:/
type "book" with "author_id" field and facet on "author_id" with
"facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted
"total" value in a very tricky way that sorting by total gave exactly that
results what I need. All was good until I faced with pagination. I tried to
do pagination in the same tricky way: in script I was counting "total" field
in a way that "offset'ed" results had been pushed into the end...... but all
this is so tricky and too hard to implement in production.
Maybe there are another ways to solve the task? Or I should switch off
from Elasticsearch...
It sounds like what you need is to create a parent/child relationship
between your authors and books, then use a has_child query to search for
matching books, and count each matched book as a score of 1. You'd need to
store the total_books by an author in the author doc itself, in order to
implement your algorithm:
I have been breaking tables with my head for 3 days already. I have that task:
"authors" have "books". I need to search authors who has books satisfying
some criteria.
The problem is that I need to sort authors by "how many matching books he
has". Exact sorting formula is "number_matching_books^2 /
total_books_author_has".
And also I need to paginate because there can be hundreds thousands of
resulting authors.
/I was solving this task in that way:/
type "book" with "author_id" field and facet on "author_id" with
"facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted
"total" value in a very tricky way that sorting by total gave exactly that
results what I need. All was good until I faced with pagination. I tried to
do pagination in the same tricky way: in script I was counting "total"
field
in a way that "offset'ed" results had been pushed into the end...... but
all
this is so tricky and too hard to implement in production.
Maybe there are another ways to solve the task? Or I should switch off
from Elasticsearch...
It sounds like what you need is to create a parent/child relationship
between your authors and books, then use a has_child query to search for
matching books, and count each matched book as a score of 1. You'd need to
store the total_books by an author in the author doc itself, in order to
implement your algorithm:
I have been breaking tables with my head for 3 days already. I have that task:
"authors" have "books". I need to search authors who has books satisfying
some criteria.
The problem is that I need to sort authors by "how many matching books he
has". Exact sorting formula is "number_matching_books^2 /
total_books_author_has".
And also I need to paginate because there can be hundreds thousands of
resulting authors.
/I was solving this task in that way:/
type "book" with "author_id" field and facet on "author_id" with
"facet_filter" on book criteria.
In facet I have written quite hard script in "value_script" which counted
"total" value in a very tricky way that sorting by total gave exactly that
results what I need. All was good until I faced with pagination. I tried
to
do pagination in the same tricky way: in script I was counting "total"
field
in a way that "offset'ed" results had been pushed into the end...... but
all
this is so tricky and too hard to implement in production.
Maybe there are another ways to solve the task? Or I should switch off
from Elasticsearch...
Thank you Clint! Your solution looks really pretty.
I have solved the task already with "top_children" query and a very monstrous script counting scores. Think I will rewrite it in your way.
In my script "found" is a map "aid => count of matching books", "aid" and "cnt" are just variables for better code.
With each matching record I increment according "found"s element and return new recalculated score.
In each "book" record I store also "authorId" and "booksCount" the author has.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.