Query information access within a plugin

cedricsalezeo · January 24, 2017, 2:13pm

Hi,

we've been using ElasticSearch for approximately 2 years. We've developed a plugin that computes a custom similarity between a query and some documents. We use the plugin in order to quickly find duplicate records in our database. Our plugin is a custom script extending the AbstractSearchScript class.

We are in the process of migrating to ElasticSearch 5.x (we are currently using 1.4).
However, we've encountered a problem - we are currently using the SearchContext.current() method to access the query information, in particular we need to access analyzed fields.
This method seems to have been removed.

Is there another way to access the query information within a plugin?

s1monw · January 24, 2017, 2:33pm

there is no way to access the search context in a script at this point. It's also not recommended to do what you did here since you don't know at what point you are accessing the search context and the query might change after the fact ie. if it's rewrittten or stuff like this. I don't exactly know what you are doing but I think the easiest would be to build your own query that executes your script and has the query that is calculates the similarity for as scorer and holds the query as a subquery? I mean I am just guessing how stuff works on your end... maybe you can share some details?

cedricsalezeo · January 24, 2017, 3:49pm

Thanks for your answer.

I'm not sure I understand the second part. If it is possible the force a custom scorer it could solve the problem, but the I don't think I can use a script in the query.
I need to compute a score function using the tokenized contents of documents and the query.

I'll detail what I do:
The process is in two part.

For each matching document, for each field,
I compute the cosine similarity between the tokens of the query and the tokens of the document based on how the field is tokenized.
For example if I tokenize using 3-grams and lowercase, for "bobby" and "bobba" I will compute the cosine similarity between [bob, obb, bby] and [bob, obb, bba].
When I have the similarity for every fields, I use a linear model to compute a similarity score between the document and the query.

Finally I use this score to determine if there is a duplicate of my query in the database.

s1monw · January 24, 2017, 3:55pm

if your plugin implements SearchPlugin you can create a custom query by returning it from List<QuerySpec<?>> getQueries() and in there you can do whatever you want. you can get a script service to execute a script, you can build custom scorers, you can also get the subquery if you need to. I don't necessarily understand why you are not using lucene's cosine similarity but I am sure you got your reasons.

cedricsalezeo · January 24, 2017, 4:19pm

Ok thanks, I will take a closer look at SearchPlugin and the getQueries() method.

For what I have read and experienced, the lucene's cosine similarity is asymmetric, normalized by the query terms and only the length of the document. Which is logical when the goal is to find a document that match with the query.

I need a symmetric similarity normalized using both query and document terms.
The goal is to find that the document is a match with the query but also that the query would be a match with the document.

system · February 21, 2017, 4:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Accessing analyzed query in scoring plugin Elasticsearch	1	564	July 5, 2017
Elasticsearch scoring plugin with access to another document Elasticsearch	2	558	September 18, 2017
Load custom analyzer defined in mapping from a plugin Elasticsearch	4	399	August 9, 2018
Accessing number of matching document inside a script Elasticsearch	2	396	May 25, 2018
How to get the terms of the query text in the plugin code? Elasticsearch	1	373	December 11, 2018

Query information access within a plugin

Related topics