Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.
My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:
Is it possible to access the whole resultset at once before the
SearchScript is called in order to bundle the
custom processing of the results? Another idea is that the results
will iterate twice over the SearchScript, in a
way that I could process the results during the first iteration and
deliver my result in the second one.
When extending AbstractFloatSearchScript, what is the correct way
to patch new content into the doc to
be returned? (or do I have to use another way).
The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?
The scripts only work one document at a time, there is no way to look at
the whole result set (as it has not been gathered yet, and basides, only
the top N are kept around while search is being executed).
Not sure I understand why you mean by patch the content? You mean change
the hits returned?
Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.
My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:
Is it possible to access the whole resultset at once before the
SearchScript is called in order to bundle the
custom processing of the results? Another idea is that the results
will iterate twice over the SearchScript, in a
way that I could process the results during the first iteration and
deliver my result in the second one.
When extending AbstractFloatSearchScript, what is the correct way
to patch new content into the doc to
be returned? (or do I have to use another way).
The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?
Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.
By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.
Best regards,
Michel
On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:
The scripts only work one document at a time, there is no way to look at the
whole result set (as it has not been gathered yet, and basides, only the top
N are kept around while search is being executed).
Not sure I understand why you mean by patch the content? You mean change the
hits returned?
Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.
My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:
Is it possible to access the whole resultset at once before the
SearchScript is called in order to bundle the
custom processing of the results? Another idea is that the results
will iterate twice over the SearchScript, in a
way that I could process the results during the first iteration and
deliver my result in the second one.
When extending AbstractFloatSearchScript, what is the correct way
to patch new content into the doc to
be returned? (or do I have to use another way).
The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?
Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.
I am afraid not, thats not how Lucene works... (or at least, I can't think
of a good way to do it now...).
By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.
Use the more generic script (extend AbstractSearchScript), and then you can
return a Map (with possible inner maps / lists) to represent the value you
want to return.
Best regards,
Michel
On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:
The scripts only work one document at a time, there is no way to look at
the
whole result set (as it has not been gathered yet, and basides, only the
top
N are kept around while search is being executed).
Not sure I understand why you mean by patch the content? You mean change
the
hits returned?
Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.
My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:
Is it possible to access the whole resultset at once before the
SearchScript is called in order to bundle the
custom processing of the results? Another idea is that the results
will iterate twice over the SearchScript, in a
way that I could process the results during the first iteration and
deliver my result in the second one.
When extending AbstractFloatSearchScript, what is the correct way
to patch new content into the doc to
be returned? (or do I have to use another way).
The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?
Hi Shay,
thanks for the quick replies. I finally got it working by iterating
twice over the results, not sure if it is the best way to do it,
but it works. I also found a bug, which is still present in the master
branch, where the lang field is missing in ScriptSortBuilder.java.
Best,
Michel
diff --git a/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java
b/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.javaindex
c887d2d..940a220 100644---
a/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java+++
b/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java@@
-93,8 +93,9 @@ public class ScriptSortBuilder extends SortBuilder { @Override public XContentBuilder toXContent(XContentBuilder builder,
Params params) throws IOException {
builder.startObject("_script"); builder.field("script",
script); builder.field("type", type);+
builder.field("lang", lang); if (order == SortOrder.DESC) {
builder.field("reverse", true); } if
(this.params != null) {
On Thu, Dec 22, 2011 at 7:06 PM, Shay Banon kimchy@gmail.com wrote:
Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.
I am afraid not, thats not how Lucene works... (or at least, I can't think
of a good way to do it now...).
By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.
Use the more generic script (extend AbstractSearchScript), and then you can
return a Map (with possible inner maps / lists) to represent the value you
want to return.
Best regards,
Michel
On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:
The scripts only work one document at a time, there is no way to look at
the
whole result set (as it has not been gathered yet, and basides, only the
top
N are kept around while search is being executed).
Not sure I understand why you mean by patch the content? You mean change
the
hits returned?
Hi,
I want to rank results in elasticsearch according to an external
source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.
My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:
Is it possible to access the whole resultset at once before the
SearchScript is called in order to bundle the
custom processing of the results? Another idea is that the results
will iterate twice over the SearchScript, in a
way that I could process the results during the first iteration and
deliver my result in the second one.
When extending AbstractFloatSearchScript, what is the correct way
to patch new content into the doc to
be returned? (or do I have to use another way).
The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?
Hi Shay,
thanks for the quick replies. I finally got it working by iterating
twice over the results, not sure if it is the best way to do it,
but it works. I also found a bug, which is still present in the master
branch, where the lang field is missing in ScriptSortBuilder.java.
Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.
I am afraid not, thats not how Lucene works... (or at least, I can't
think
of a good way to do it now...).
By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.
Use the more generic script (extend AbstractSearchScript), and then you
can
return a Map (with possible inner maps / lists) to represent the value
you
want to return.
Best regards,
Michel
On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:
The scripts only work one document at a time, there is no way to look
at
the
whole result set (as it has not been gathered yet, and basides, only
the
top
N are kept around while search is being executed).
Not sure I understand why you mean by patch the content? You mean
change
the
hits returned?
Hi,
I want to rank results in elasticsearch according to an external
source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.
My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:
Is it possible to access the whole resultset at once before the
SearchScript is called in order to bundle the
custom processing of the results? Another idea is that the results
will iterate twice over the SearchScript, in a
way that I could process the results during the first iteration and
deliver my result in the second one.
When extending AbstractFloatSearchScript, what is the correct way
to patch new content into the doc to
be returned? (or do I have to use another way).
The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.