KNN-style percolator

heino1986 · June 23, 2016, 9:16am

Is it possible to create a percolator type procedure that passes documents from index A and conducts a full-text similarity query against documents in index B, whereby only k number of nearest-neighbour documents are retrieved from B. The documents in A are then classified based on 'majority' vote of the retrieved k documents from index B.

Thanks,
Chris

mvg · June 23, 2016, 10:17am

The percolator can only evaluate a single document at the time. So I think that what you like to do cannot be done with the percolator.

heino1986 · June 23, 2016, 10:38am

Hi Martijn! Do you think it is possible to write a script that whenever a new document is indexed I use the more_like_this query to retrieve k nearest neighbours, and to do it that way? I'm new to elasticsearch and my programming skills aren't that advanced as my background is in statistics...

Greetings from London

mvg · June 23, 2016, 11:34am

Hi Chris,

Yes, that is possible. Just make sure that before you run your script, that you've refreshed the index, otherwise the newly indexed document isn't visible in the search api.

Martijn

Mark_Harwood · June 23, 2016, 12:22pm

I presume when you run this free-text query and assessing the "majority vote" that you are analysing some existing structured classification field eg "tag" or "category" to assess the most relevant tag.

You can use aggregations to do this but I have 2 tips:

use the 'sampler' aggregation to consider only the top N results from the MLT query
use the 'significant_terms' aggregation instead of the 'terms' aggregation to get the top N tags. Popular tags like "software" are perhaps less interesting than "search engine". Significant terms sniffs these more important classifications out.

Topic		Replies	Views
Percolator in ES 5 - minimum similarity Elasticsearch	2	793	July 5, 2017
Just Pushed: Percolator Elasticsearch	7	291	July 6, 2017
Percolate Knn queries Elasticsearch vector-search	2	153	May 14, 2024
Compare two different indexes and create new ones Elasticsearch	5	1867	November 16, 2018
Percolator settings/questions Elasticsearch	1	276	July 6, 2017

KNN-style percolator

Related topics