Calculating a "computed" value based on index statistics / term frequencies


(Kevin Blaisdell) #1

As background I have some Lucene based code which is used to manipulate
index statistics to generate numeric document vectors. This code sits
between systems that need document vectors for input and Lucene indexes
that are the store of the source data & statistics (term & document
frequencies). I not too familiar with ES and hoping I could get some
pointers to understand where best to focus my investigation or if am taking
a totally incompatible approach.

I am just starting to try dig into how the plugin framework works. Is a
separate plugin that could be called as extensions to the native API a
logical place to do this? Assuming a plugin can expose native API and not
just a REST extension.

Is there an approach in ES that allows you to have query results include a
"computed" field instead of creating an entirely separate plugin & API
extension for this? What I mean here is a field that would be the result
of some custom code running on query instead of from an actual stored field
in the index.

Finally, is there a way to inject some preprocessing into the ES indexing /
ingestion pipeline where you could precompute and store calculated fields?
I believe SOLR has something for this and ES doesn't because of the
potential to slow down indexing, but perhaps I am just not looking in the
right place.

Thanks
Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/028b85e6-a14e-4512-8dff-0f4da6253d29%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Kevin Blaisdell) #2

Quick correction. I remembered precomputing prior to population of the
index wouldn't work for me in this case because there wouldn't be the term
frequency data for the full corpus.

On Tuesday, March 4, 2014 11:56:04 AM UTC+2, Kevin B wrote:

As background I have some Lucene based code which is used to manipulate
index statistics to generate numeric document vectors. This code sits
between systems that need document vectors for input and Lucene indexes
that are the store of the source data & statistics (term & document
frequencies). I not too familiar with ES and hoping I could get some
pointers to understand where best to focus my investigation or if am taking
a totally incompatible approach.

I am just starting to try dig into how the plugin framework works. Is a
separate plugin that could be called as extensions to the native API a
logical place to do this? Assuming a plugin can expose native API and not
just a REST extension.

Is there an approach in ES that allows you to have query results include a
"computed" field instead of creating an entirely separate plugin & API
extension for this? What I mean here is a field that would be the result
of some custom code running on query instead of from an actual stored field
in the index.

Finally, is there a way to inject some preprocessing into the ES indexing
/ ingestion pipeline where you could precompute and store calculated
fields? I believe SOLR has something for this and ES doesn't because of
the potential to slow down indexing, but perhaps I am just not looking in
the right place.

Thanks
Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f56b2437-e007-4770-80b7-16e4401524d4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3