We need a way to get all terms in a document set sorted by tf*idf. I have
been looking at ways of getting this data in the most efficient way
possible and I thought I would simply post here to see if elasticsearch
allows you to do something like that
Essentially it would be a facet search sorted by tf*idf descending, so I
could get the highest scoring terms first
Did you by any chance have any progress on this? This is exactly what I
need too.
Cheers.
On Thursday, October 25, 2012 7:48:55 AM UTC+2, Chris Rode wrote:
Hi All
We need a way to get all terms in a document set sorted by tf*idf. I have
been looking at ways of getting this data in the most efficient way
possible and I thought I would simply post here to see if elasticsearch
allows you to do something like that
Essentially it would be a facet search sorted by tf*idf descending, so I
could get the highest scoring terms first
I'm not the original Chris but I do have some thoughts on the issue,
The original need sounds like it's more related to term vectors than to
facets, but Elasticsearch does not expose term vectors out-of-box. If you
felt comfortable writing some Java code you could write your own plugin
which accessed them and returned them.
Alternatively if you want to pursue the facet approach, you'll still need
to write a plugin (but a less complex one) which introduces a custom
FacetCollector and computes the information for each term.
Both approaches require using the Lucene API a little but are very doable.
On Thursday, November 15, 2012 8:03:52 PM UTC+13, Ferdy Galema wrote:
Hi Chris,
Did you by any chance have any progress on this? This is exactly what I
need too.
Cheers.
On Thursday, October 25, 2012 7:48:55 AM UTC+2, Chris Rode wrote:
Hi All
We need a way to get all terms in a document set sorted by tf*idf. I have
been looking at ways of getting this data in the most efficient way
possible and I thought I would simply post here to see if elasticsearch
allows you to do something like that
Essentially it would be a facet search sorted by tf*idf descending, so I
could get the highest scoring terms first
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.