Patching new information into the document during search


(Michel Conrad) #1

Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.

My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:

  1. Is it possible to access the whole resultset at once before the
    SearchScript is called in order to bundle the
    custom processing of the results? Another idea is that the results
    will iterate twice over the SearchScript, in a
    way that I could process the results during the first iteration and
    deliver my result in the second one.

  2. When extending AbstractFloatSearchScript, what is the correct way
    to patch new content into the doc to
    be returned? (or do I have to use another way).

The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?

Thanks,
Michel


(Shay Banon) #2

The scripts only work one document at a time, there is no way to look at
the whole result set (as it has not been gathered yet, and basides, only
the top N are kept around while search is being executed).

Not sure I understand why you mean by patch the content? You mean change
the hits returned?

On Wed, Dec 21, 2011 at 7:12 PM, Michel Conrad <
michel.conrad@trendiction.com> wrote:

Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.

My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:

  1. Is it possible to access the whole resultset at once before the
    SearchScript is called in order to bundle the
    custom processing of the results? Another idea is that the results
    will iterate twice over the SearchScript, in a
    way that I could process the results during the first iteration and
    deliver my result in the second one.

  2. When extending AbstractFloatSearchScript, what is the correct way
    to patch new content into the doc to
    be returned? (or do I have to use another way).

The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?

Thanks,
Michel


(Michel Conrad) #3

Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.

By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.

Best regards,
Michel

On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:

The scripts only work one document at a time, there is no way to look at the
whole result set (as it has not been gathered yet, and basides, only the top
N are kept around while search is being executed).

Not sure I understand why you mean by patch the content? You mean change the
hits returned?

On Wed, Dec 21, 2011 at 7:12 PM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.

My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:

  1. Is it possible to access the whole resultset at once before the
    SearchScript is called in order to bundle the
    custom processing of the results? Another idea is that the results
    will iterate twice over the SearchScript, in a
    way that I could process the results during the first iteration and
    deliver my result in the second one.

  2. When extending AbstractFloatSearchScript, what is the correct way
    to patch new content into the doc to
    be returned? (or do I have to use another way).

The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?

Thanks,
Michel


(Shay Banon) #4

On Thu, Dec 22, 2011 at 11:52 AM, Michel Conrad <
michel.conrad@trendiction.com> wrote:

Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.

I am afraid not, thats not how Lucene works... (or at least, I can't think
of a good way to do it now...).

By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.

Use the more generic script (extend AbstractSearchScript), and then you can
return a Map (with possible inner maps / lists) to represent the value you
want to return.

Best regards,
Michel

On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:

The scripts only work one document at a time, there is no way to look at
the
whole result set (as it has not been gathered yet, and basides, only the
top
N are kept around while search is being executed).

Not sure I understand why you mean by patch the content? You mean change
the
hits returned?

On Wed, Dec 21, 2011 at 7:12 PM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi,
I want to rank results in elasticsearch according to an external source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.

My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:

  1. Is it possible to access the whole resultset at once before the
    SearchScript is called in order to bundle the
    custom processing of the results? Another idea is that the results
    will iterate twice over the SearchScript, in a
    way that I could process the results during the first iteration and
    deliver my result in the second one.

  2. When extending AbstractFloatSearchScript, what is the correct way
    to patch new content into the doc to
    be returned? (or do I have to use another way).

The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?

Thanks,
Michel


(Michel Conrad) #5

Hi Shay,
thanks for the quick replies. I finally got it working by iterating
twice over the results, not sure if it is the best way to do it,
but it works. I also found a bug, which is still present in the master
branch, where the lang field is missing in ScriptSortBuilder.java.

Best,
Michel

diff --git a/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java
b/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.javaindex
c887d2d..940a220 100644---
a/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java+++
b/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java@@
-93,8 +93,9 @@ public class ScriptSortBuilder extends SortBuilder {
@Override public XContentBuilder toXContent(XContentBuilder builder,
Params params) throws IOException {
builder.startObject("_script"); builder.field("script",
script); builder.field("type", type);+
builder.field("lang", lang); if (order == SortOrder.DESC) {
builder.field("reverse", true); } if
(this.params != null) {

On Thu, Dec 22, 2011 at 7:06 PM, Shay Banon kimchy@gmail.com wrote:

On Thu, Dec 22, 2011 at 11:52 AM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.

I am afraid not, thats not how Lucene works... (or at least, I can't think
of a good way to do it now...).

By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.

Use the more generic script (extend AbstractSearchScript), and then you can
return a Map (with possible inner maps / lists) to represent the value you
want to return.

Best regards,
Michel

On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:

The scripts only work one document at a time, there is no way to look at
the
whole result set (as it has not been gathered yet, and basides, only the
top
N are kept around while search is being executed).

Not sure I understand why you mean by patch the content? You mean change
the
hits returned?

On Wed, Dec 21, 2011 at 7:12 PM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi,
I want to rank results in elasticsearch according to an external
source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.

My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:

  1. Is it possible to access the whole resultset at once before the
    SearchScript is called in order to bundle the
    custom processing of the results? Another idea is that the results
    will iterate twice over the SearchScript, in a
    way that I could process the results during the first iteration and
    deliver my result in the second one.

  2. When extending AbstractFloatSearchScript, what is the correct way
    to patch new content into the doc to
    be returned? (or do I have to use another way).

The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?

Thanks,
Michel


(Shay Banon) #6

Thanks!, opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1569.

On Fri, Dec 23, 2011 at 12:35 PM, Michel Conrad <
michel.conrad@trendiction.com> wrote:

Hi Shay,
thanks for the quick replies. I finally got it working by iterating
twice over the results, not sure if it is the best way to do it,
but it works. I also found a bug, which is still present in the master
branch, where the lang field is missing in ScriptSortBuilder.java.

Best,
Michel

diff --git
a/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java

b/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.javaindex
c887d2d..940a220 100644---

a/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java+++

b/modules/elasticsearch/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java@
@
-93,8 +93,9 @@ public class ScriptSortBuilder extends SortBuilder {
@Override public XContentBuilder toXContent(XContentBuilder builder,
Params params) throws IOException {
builder.startObject("_script"); builder.field("script",
script); builder.field("type", type);+
builder.field("lang", lang); if (order == SortOrder.DESC) {
builder.field("reverse", true); } if
(this.params != null) {

On Thu, Dec 22, 2011 at 7:06 PM, Shay Banon kimchy@gmail.com wrote:

On Thu, Dec 22, 2011 at 11:52 AM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi Shay,
When there is no way of getting the resultset at once, is it possible
to score the documents in bulk,
lets say 1000 at a time, or is it strictly one document at a time?
Another idea is if it would be possible to collect the results twice,
so that on the second iteration I would
have the result of my calculation and could do a customized ranking
dependend on the entire resultset.

I am afraid not, thats not how Lucene works... (or at least, I can't
think
of a good way to do it now...).

By patching the content I meant a scripted field. I couldn't find out
how to add a field
in Java during the scoring of the results. Currently I am using an
AbstractFloatSearchScript.

Use the more generic script (extend AbstractSearchScript), and then you
can
return a Map (with possible inner maps / lists) to represent the value
you
want to return.

Best regards,
Michel

On Thu, Dec 22, 2011 at 1:49 AM, Shay Banon kimchy@gmail.com wrote:

The scripts only work one document at a time, there is no way to look
at

the
whole result set (as it has not been gathered yet, and basides, only
the

top
N are kept around while search is being executed).

Not sure I understand why you mean by patch the content? You mean
change

the
hits returned?

On Wed, Dec 21, 2011 at 7:12 PM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi,
I want to rank results in elasticsearch according to an external
source.
Therefore I want during the scoring of the results do some processing
for the matched ids in order to rank them.
What I also want to do is to patch the result of my calculations into
the document.

My idea was to use a native SearchScript. From my understanding the
script is called once for every result.
So my question is twofold:

  1. Is it possible to access the whole resultset at once before the
    SearchScript is called in order to bundle the
    custom processing of the results? Another idea is that the results
    will iterate twice over the SearchScript, in a
    way that I could process the results during the first iteration and
    deliver my result in the second one.

  2. When extending AbstractFloatSearchScript, what is the correct way
    to patch new content into the doc to
    be returned? (or do I have to use another way).

The processing itself should be quick, but I need to perform it in a
bulk operation (per shard should not be a problem),
so I would need a way to access the resultset in advance. Currently I
don't know exactly were to start looking for,
so could someone please point me in the right direction?

Thanks,
Michel


(system) #7