Pushed: more script goodness, including script fields, scripted facets, and script filter

Hi,

As promised, once I built the infrastructure, adding exciting features
becomes really simple. Here is a breakdown of some features added recently
(on top of the facets and custom_score query):

  • Script Fields:

The ability to return script evaluation per hit of fields that are not
stored
. This make sense (as any other scripted field on numeric or
not_analyzed fields). Issue is here:
http://github.com/elasticsearch/elasticsearch/issues/221. For example:

{
"query" : {
...
},
"script_fields" : {
"test1" : {
"script" : "doc['my_field_name'].value * 2"
},
"test2" : {
"script" : "doc['my_field_name'].value * factor",
"params" : {
"factor" : 2.0
}
}
}
}

  • Script filter

Allows to define a filter that uses a script to filter out results. Issue
here: http://github.com/elasticsearch/elasticsearch/issues/226. Sample:

"filtered" : {
"query" : {
...
},
"filter" : {
"script" : {
"script" : "doc['num1'].value > 1"
}
}
}

  • Scripted facets

Both statistical and histogram facets now allow to define scripts instead of
field names. Issues are:
http://github.com/elasticsearch/elasticsearch/issues/227, and
http://github.com/elasticsearch/elasticsearch/issues/228. Here is a sample
of of statistical facet:

{
"query" : {
"match_all" : { }
},
"facets" : {
"test" : {
"statistical" : {
"script" : "(doc['num1'] + doc['num2']) * factor",
"params" : {
"factor" : 2.2
}
},
"global" : false
}
}
}

-shay.banon

Questions for HCG (Human Code Generator):

  • Are script filters applied after the fact? (after the matches have
    been scored and retrieved? If so, how does one deal with a situation
    where N matches are needed at the end, but the filter(s) may end up
    removing some unknown number of them? Does one simply have to get m x
    N hits and hope that N or more hits are left after the filters?

  • Can you explain what statistical and histogram facets are? The
    example for the histogram shows, I think, that you can derive new
    facet values from partial field values (e.g. from a date field you can
    extract "minuteOfHour" and that becomes the new facet value)... but I
    think I'm missing something here, otherwise you wouldn't call them
    "histogram" facets?
    Similarly, what makes statistical facets statistical? I can't figure
    it out from the example in the issue, which I think shows that you can
    create a new facet value from other fields.

  • I think all examples I saw for the new "script" stuff involved
    numeric or date fields. Can they be used with string fields, and if
    so, can you give some examples? For example, if I have 2 fields,
    "username" and "id", and they have 1:1 mapping, could I use the
    "script" functionality to end up with facets of " "
    format?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

On Jun 15, 9:56 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

As promised, once I built the infrastructure, adding exciting features
becomes really simple. Here is a breakdown of some features added recently
(on top of the facets and custom_score query):

  • Script Fields:

The ability to return script evaluation per hit of fields that are not
stored
. This make sense (as any other scripted field on numeric or
not_analyzed fields). Issue is here:Search: Allow to specify script fields to be loaded · Issue #221 · elastic/elasticsearch · GitHub. For example:

{
"query" : {
...
},
"script_fields" : {
"test1" : {
"script" : "doc['my_field_name'].value * 2"
},
"test2" : {
"script" : "doc['my_field_name'].value * factor",
"params" : {
"factor" : 2.0
}
}
}

}

  • Script filter

Allows to define a filter that uses a script to filter out results. Issue
here:Script Filter: Support providing a custom script as a filter · Issue #226 · elastic/elasticsearch · GitHub. Sample:

"filtered" : {
"query" : {
...
},
"filter" : {
"script" : {
"script" : "doc['num1'].value > 1"
}
}

}

  • Scripted facets

Both statistical and histogram facets now allow to define scripts instead of
field names. Issues are:Facets: Script statistical facets · Issue #227 · elastic/elasticsearch · GitHub, andhttp://github.com/elasticsearch/elasticsearch/issues/228. Here is a sample
of of statistical facet:

{
"query" : {
"match_all" : { }
},
"facets" : {
"test" : {
"statistical" : {
"script" : "(doc['num1'] + doc['num2']) * factor",
"params" : {
"factor" : 2.2
}
},
"global" : false
}
}

}

-shay.banon

On Wed, Jun 16, 2010 at 7:20 AM, Otis otis.gospodnetic@gmail.com wrote:

Questions for HCG (Human Code Generator):

  • Are script filters applied after the fact? (after the matches have
    been scored and retrieved? If so, how does one deal with a situation
    where N matches are needed at the end, but the filter(s) may end up
    removing some unknown number of them? Does one simply have to get m x
    N hits and hope that N or more hits are left after the filters?

Script filters work like any other filter, term filter / range filter and so
on. Its basically a Lucene Filer implementation.

  • Can you explain what statistical and histogram facets are? The
    example for the histogram shows, I think, that you can derive new
    facet values from partial field values (e.g. from a date field you can
    extract "minuteOfHour" and that becomes the new facet value)... but I
    think I'm missing something here, otherwise you wouldn't call them
    "histogram" facets?
    Similarly, what makes statistical facets statistical? I can't figure
    it out from the example in the issue, which I think shows that you can
    create a new facet value from other fields.

I explained it on another post (
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/73a35a5168896b45#),
the script is just an extension. The statistical one is explained here:
Issues · elastic/elasticsearch · GitHub. The
original one does min/max/count/mean/std for a numeric field, the script one
basically gives the same stats on a number value returned by the script. The
histogram ones is explained in the original post linked about, except now,
on top of fixed key and value fields, they can be scripts.

  • I think all examples I saw for the new "script" stuff involved
    numeric or date fields. Can they be used with string fields, and if
    so, can you give some examples? For example, if I have 2 fields,
    "username" and "id", and they have 1:1 mapping, could I use the
    "script" functionality to end up with facets of " "
    format?

Yes, they can be used with string fields, but only where it makes sense. For
example, script fields can return a concatenation of the "username" and "id"
if you want. For facets, you have the terms facet for strings, which has
nothing to do with scripts, other than that, in the statistical and
historical facets, the return values must be numbers (they work on
numbers) but it does not mean you can't use the string fields in your
script.

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

On Jun 15, 9:56 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

As promised, once I built the infrastructure, adding exciting features
becomes really simple. Here is a breakdown of some features added
recently
(on top of the facets and custom_score query):

  • Script Fields:

The ability to return script evaluation per hit of fields that are not
stored
. This make sense (as any other scripted field on numeric or
not_analyzed fields). Issue is here:
Search: Allow to specify script fields to be loaded · Issue #221 · elastic/elasticsearch · GitHub. For example:

{
"query" : {
...
},
"script_fields" : {
"test1" : {
"script" : "doc['my_field_name'].value * 2"
},
"test2" : {
"script" : "doc['my_field_name'].value * factor",
"params" : {
"factor" : 2.0
}
}
}

}

  • Script filter

Allows to define a filter that uses a script to filter out results. Issue
here:Script Filter: Support providing a custom script as a filter · Issue #226 · elastic/elasticsearch · GitHub. Sample:

"filtered" : {
"query" : {
...
},
"filter" : {
"script" : {
"script" : "doc['num1'].value > 1"
}
}

}

  • Scripted facets

Both statistical and histogram facets now allow to define scripts instead
of
field names. Issues are:
Facets: Script statistical facets · Issue #227 · elastic/elasticsearch · GitHub, andhttp://
Facets: Script Histogram facet · Issue #228 · elastic/elasticsearch · GitHub. Here is a sample
of of statistical facet:

{
"query" : {
"match_all" : { }
},
"facets" : {
"test" : {
"statistical" : {
"script" : "(doc['num1'] + doc['num2']) * factor",
"params" : {
"factor" : 2.2
}
},
"global" : false
}
}

}

-shay.banon