Can you change the "type" of a token emitted by an analyzer?

stefanvuk · July 17, 2019, 10:26pm

Question: Is there a way to change the type of token emitted by an analyzer?

Context:
I'm using the annotated text plugin to mark up some fields with annotations. I'd like to query these fields with "match" queries using the annotated values, and I'd like for the absence of the annotation to not match a document that lacks it.

For example, if I have the annotated text

Text1: "[Thomas Jefferson](_president_) was born in Virginia(_place_)"
Text2: "[Thomas Jefferson](_writer_) was born in Virginia(_place_)."

I'd like to be able to execute the query

{
    "match_phrase": {
        "annotatedField": "_president_ was born in Virginia"
    }
}

and have it match Text1 and but not Text2.

But nothing matches. After hitting some of the analysis endpoints, I see that the original text has _president_ analyzed as a token of type 'annotation', but in the match_phrase query, _president_ gets analyzed as '<ALPHANUM>'. I assume this is why I don't get a match: the token types are different. I'm also assuming that if the token types were the same, then I would get a match.

I then tried the query

{
    "match_phrase": {
        "annotatedField": "[Thomas Jefferson](_president_) was born in Virginia"
    }
}

And in this case _president_ is given a token type of annotation, but I also have the tokens "Thomas" "Jefferson" in the query which will then match both Text1 and Text2.

I am aware that I can get the effect that I want using a boolean query and adding a term query with the annotation, but in this case the structure of the sentence is lost.

Here are my questions.
(1) Is there a way to give an explicit token in the query? Some control character I can use to specify that _president_ should be treated as an annotation in the query?
(2) Is there a token filter I can use to match tokens of a certain structure and change its type? (I've looked at the the regex token filters, for example, and these let me change the specific contents of the token, but I don't find in the documentation a method for changing the type of a token).
(3) Is there any other way I can achieve the behavior that I want? Perhaps focusing on the annotated text is the wrong approach.

Any help is appreciated.

system · August 14, 2019, 10:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing annotation as they are {@this.Example} Elasticsearch	2	530	April 26, 2017
Indexing annotation as they are in {@this.Example} and searching by them Elasticsearch	1	529	November 8, 2017
How does the match_phrase work for a field with different search_analyzer/index_analyzer? Elasticsearch	1	381	July 6, 2017
Custom analyzer with token replacement Elasticsearch	1	238	December 13, 2021
Data Type to specify exact tokens to index Elasticsearch	14	793	October 23, 2019

Can you change the "type" of a token emitted by an analyzer?

Related topics