Hello everyone this might be a semi-long post.
My use case is the following.
We have a stack that makes use of a logistic regression on different features in python to rank documents.
I am POCing a version on ES to be able to make use of full text search before applying log reg.
the simplified properties of my index ressemble something like this:
"properties": {
"searchable_objects_belonging_to_main_doc": {
"type": "nested",
"properties": {
"searchable_property": "text",
...
"property_use_to_make_calculation": "double"
}
}
"other_property": "double"
...
}
My first approach was to use a nested query, then use a rescore function with a painless script to apply log reg on the docs returned by the nested query.
=> problem, I need some information about the nested objects that matched in the first nested query in order to apply the log reg. From what I read, getting information about nested doc match in rescore is not possible because they are separate lucene docs.
Second approach was to rethink the mapping. Each of the nested objects would be a document containing the main document it belongs to. This way no more need for nested objects.
=> problem, i would need to use collapse before rescore so that the window of documents sent to rescore would contain each main document only once. And from what I read, it is not possible to use collapse + rescore (explicit exception is raised).
Third approach:
Our log reg is of the form:
x = feat1 + feat2 + feat3 ... + featN
return 1/(1 + Math.exp(x))
which could translate to
(1/(1 + Math.exp(feat1))) * (1/(1 + Math.exp(feat2))) * ... * (1/(1 + Math.exp(featN)))
So, given feat1 is based on the values contained in the matching documents of the nested request, I could make a nested query + script score inside of it and retrieve the value of the score in the top document ! => this works
BUT now i need to be able to multiply the retrieved value with other values. Something like:
"should": [
{"nested": ... => returns the value of (1/(1 + Math.exp(feat1))) for the matching nested docs},
{ this query would calculate (1/(1 + Math.exp(feat2))) },
{ ... },
{ this query would calculate (1/(1 + Math.exp(featN))) }
]
but for that to work, I would need to have the should clause multiply the score and from what I read on this forum this has not been implemented due to lack of use case.
Maybe this one is valid ? Or maybe I am going the wrong direction and I gladly take any pointers