Facet query sorted by tf*idf

Hi All

We need a way to get all terms in a document set sorted by tf*idf. I have
been looking at ways of getting this data in the most efficient way
possible and I thought I would simply post here to see if elasticsearch
allows you to do something like that

Essentially it would be a facet search sorted by tf*idf descending, so I
could get the highest scoring terms first

Else is there a way to get the idf score for a term from ElasticSearch so
that I can calculate the tf*idf score myself? Perhaps
using http://www.elasticsearch.org/guide/reference/api/explain.html?

Cheers for your help

Chris

--

Hi Chris,

Did you by any chance have any progress on this? This is exactly what I
need too.

Cheers.

On Thursday, October 25, 2012 7:48:55 AM UTC+2, Chris Rode wrote:

Hi All

We need a way to get all terms in a document set sorted by tf*idf. I have
been looking at ways of getting this data in the most efficient way
possible and I thought I would simply post here to see if elasticsearch
allows you to do something like that

Essentially it would be a facet search sorted by tf*idf descending, so I
could get the highest scoring terms first

Else is there a way to get the idf score for a term from Elasticsearch so
that I can calculate the tf*idf score myself? Perhaps using
Elasticsearch Platform — Find real-time answers at scale | Elastic?

Cheers for your help

Chris

--

I'm not the original Chris but I do have some thoughts on the issue,

The original need sounds like it's more related to term vectors than to
facets, but Elasticsearch does not expose term vectors out-of-box. If you
felt comfortable writing some Java code you could write your own plugin
which accessed them and returned them.

Alternatively if you want to pursue the facet approach, you'll still need
to write a plugin (but a less complex one) which introduces a custom
FacetCollector and computes the information for each term.

Both approaches require using the Lucene API a little but are very doable.

On Thursday, November 15, 2012 8:03:52 PM UTC+13, Ferdy Galema wrote:

Hi Chris,

Did you by any chance have any progress on this? This is exactly what I
need too.

Cheers.

On Thursday, October 25, 2012 7:48:55 AM UTC+2, Chris Rode wrote:

Hi All

We need a way to get all terms in a document set sorted by tf*idf. I have
been looking at ways of getting this data in the most efficient way
possible and I thought I would simply post here to see if elasticsearch
allows you to do something like that

Essentially it would be a facet search sorted by tf*idf descending, so I
could get the highest scoring terms first

Else is there a way to get the idf score for a term from Elasticsearch so
that I can calculate the tf*idf score myself? Perhaps using
Elasticsearch Platform — Find real-time answers at scale | Elastic?

Cheers for your help

Chris

--