Filter results whose score is less than 2 std devs from mean; plugin, maybe?


(Woody Peterson) #1

Through trial and error on my dataset I've found that results whose scores
are less than two standard deviations from the mean score of a query are
junk. This ticket https://github.com/elasticsearch/elasticsearch/issues/719
points out that in order to do anything like this you'd have to search
twice, which is obviously wasteful.

Since I'm using a 3rd party library, tire, to access elasticsearch, it
would be ideal if I could implement the functionality in a plugin that adds
a score stats facet to queries, and does a simple iterative filter before
returning the results. Although a bit hacky, I think it would be good
enough, and possibly less work than rewriting/hacking tire.

I looked at the plugins docs page, and it's not clear whether this would be
possible.

Thoughts? First steps?

Thanks for any help,

-Woody


(system) #2