Query information access within a plugin


(cedric) #1

Hi,

we've been using ElasticSearch for approximately 2 years. We've developed a plugin that computes a custom similarity between a query and some documents. We use the plugin in order to quickly find duplicate records in our database. Our plugin is a custom script extending the AbstractSearchScript class.

We are in the process of migrating to ElasticSearch 5.x (we are currently using 1.4).
However, we've encountered a problem - we are currently using the SearchContext.current() method to access the query information, in particular we need to access analyzed fields.
This method seems to have been removed.

Is there another way to access the query information within a plugin?


(Simon Willnauer) #2

there is no way to access the search context in a script at this point. It's also not recommended to do what you did here since you don't know at what point you are accessing the search context and the query might change after the fact ie. if it's rewrittten or stuff like this. I don't exactly know what you are doing but I think the easiest would be to build your own query that executes your script and has the query that is calculates the similarity for as scorer and holds the query as a subquery? I mean I am just guessing how stuff works on your end... maybe you can share some details?


(cedric) #3

Thanks for your answer.

I'm not sure I understand the second part. If it is possible the force a custom scorer it could solve the problem, but the I don't think I can use a script in the query.
I need to compute a score function using the tokenized contents of documents and the query.

I'll detail what I do:
The process is in two part.

  1. For each matching document, for each field,
    I compute the cosine similarity between the tokens of the query and the tokens of the document based on how the field is tokenized.
    For example if I tokenize using 3-grams and lowercase, for "bobby" and "bobba" I will compute the cosine similarity between [bob, obb, bby] and [bob, obb, bba].

  2. When I have the similarity for every fields, I use a linear model to compute a similarity score between the document and the query.

Finally I use this score to determine if there is a duplicate of my query in the database.


(Simon Willnauer) #4

if your plugin implements SearchPlugin you can create a custom query by returning it from List<QuerySpec<?>> getQueries() and in there you can do whatever you want. you can get a script service to execute a script, you can build custom scorers, you can also get the subquery if you need to. I don't necessarily understand why you are not using lucene's cosine similarity but I am sure you got your reasons.


(cedric) #5

Ok thanks, I will take a closer look at SearchPlugin and the getQueries() method.

For what I have read and experienced, the lucene's cosine similarity is asymmetric, normalized by the query terms and only the length of the document. Which is logical when the goal is to find a document that match with the query.

I need a symmetric similarity normalized using both query and document terms.
The goal is to find that the document is a match with the query but also that the query would be a match with the document.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.