Trying to write a Facet to replace a Lucene collector


(phill) #1

I'm thinking I can replace a Lucene collector I have with a Facet with a
script.
I'm starting to experiment using a term stats facet or statistics facet
to get the map-reduce behavior (mostly the map behavior) I'm looking
for. If I really needed to reduce my results from all shards, I'd have
to go down the custom Facet approach, but for now it looks doable as a
variation on a statistical collector with possible a bit of post
processing after I get the results.

Just to establish a concrete example, if I have documents that represent
files, including a path, mod_date, and file_size. I might run a script
to transform the path to something less (e.g. some appropriate
ancestor/parent directory) and gather statistics about dates of all
documents in that directory or the whole directory tree below a
directory using a term stats facet and script transforming where the
transformed path|as the key_field and the mod date as the value_field.

This could be used in an application where there is an interface that
shows info about the folders containing the documents found as well as
the document search results.

|My first attempt I'll be writing a native script, so my 1st question is:

Question 1: Can I add a script definition dynamically via the REST API?

$ curl -XPUT 'http://172.16.0.164:9200/test_index/_settings' -d '
script.native.myScript.type: com.me.scripts.MyNativeScriptFactory
'
based on:
|http://www.elasticsearch.org/guide/reference/modules/scripting.html||
The above should register a script named "myScript".

||But this api is on the page about modules, so I'm not sure if this
setting is actually part of a module and thus will be in the right place
when it is needed.

I will eventually declare such a script at the cluster level, but the
question remains about using the REST API.

Question 1a: What is a quick way to show that the above setting worked,
but the class is not (yet) found?

Question 2: Can I return either false or a new value from the same script?

I see that scripts for term facet can be written to return either a
boolean or a new key.
I'm assuming I can filter out unwanted or redundant values by returning
a boolean false when the term value is irrelevant, but alternately a new
transformed key when I want to use send on a transformed value.

Question 3: ||Is a native script created (the factory called) once per
facet invocation or once per document hit? |

|If I did try to filter out "redundant" keys (see previous question),
I'd have to have a way to know that my script is starting on a new query
or facet and not just being asked to process an isolated document.

thanks in advance.

-Paul
|

|
|

|||

||
|


(system) #2