Hey, I was using PostgreSQL for searching so far since my applications
relies on it anyway. I found a way for scoring which I am pretty happy with
but search times are just inacceptable since it is searching in a table
with about 9 mio rows.
I started with elasticsearch just a few days ago in a PHP Project using the
elastica client. I indexed about a million documents with book titles and
some related information using the jdbc plugin but don't know if the
scoring I want is possible at all with elasticsearch. One maybe rather
unusual requirement is to different scoring for different users depending
on their language skills.
I have the following index of books which different editions each:
title | author | ratings | bookid | editionid
languageid |
---|
Hamlet |
eng |
Hamlet by Shakespeare |
eng |
Hamlet bilingual |
eng |
Romeo and Juliet |
eng |
Romeo und Julia |
ger |
Roméo et Juliette |
fre |
Othello |
eng |
I want only one title of each bookid in the results - the one with the
highest score which is calculated as follows:
Take the highest ratings count of all results and call it max_r.
Put a weight on the language - 100 for the uses first language, 5 for his
second spoken language, 3 for the third and 2 for the forth (in case
someone is so blessed with language skills) and call it lang_weight. I have
the information of languages for logged in users.
score = weight * 100/max_r * ratings
Is there a way to get the max_r during the query and use it like in sql?
How can I tell elasticsearch to use the weight based on the languageid?
And I still would like to inegrate fuzzy search.
Here some examples of what I would like to get. Depending on the user I
would get following results when searching:
english user searching for 'hamlet':
- editionid 1
only one result since the bookid shall be unique in a result set and
edition 1 has the highest order, score = 100 * 0.036 * 2784 = 10,000
editionid 2 has the score = 100 * 0.036 * 71 = 255
editionid 3 has the score = 100 * 0.036 * 144 = 517
german user (with second language eng and third fre) searching for 'romeo &
shakespeare':
- editionid 5
because in his first language is ger, score = 100 * 0.068 * 325 = 2211
editionid 4 has the score = 5 * 0.068 * 1470 = 500
editionid 6 has the score = 3 * 0.068 * 487 = 99
spanish user (with second language german, third french, forth english)
searching for 'Shakespeare':
- edition 1
score = 2 * 0.036 * 2784 = 200 - edition 7
score = 2 * 0.036 * 1548 = 111 - edition 5
score = 5 * 0.036 * 325 = 58
I would be very happy if you could point me to the right direction since
this is all quite confusing for me so far.
Thanks a lot.
--