Hello, I'm trying to figure out a way to index a special type of annotations that I developed so I then I can look by them later at query time, for example I may have a couple of documents with text
"this is a cool {@this.Example}"
"I think this is super cool"
I want to search for "I want some cool {@this.Example}" and match only the first document, however right now I'm matching both since there's an overlap of terms, I was trying to subquery my way around this, but seems like my annotations get indexed in a different way that I cannot match
{
'query': {
"bool":{
"must":{
"match": {
"doc_field":"{@this.Example}"
}
},
"should":{
"doc_field":{
"query":"I want something cool",
"fuzziness":"1"
}
}
}
}
}
I'm using the following Analyzer
but without much results
"annotated_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"english_stop",
"porter_stem"
],
"tokenizer": "whitespace",
}
oddly enough when creating the mapping for a field the search_analyzer
is ignored, I'm not even sure if this analyzer is being used at search time
if I run _analyze
on the field I do get what I want
GET my_index/_analyze
{
"field": "doc_field",
"text": "I want some cool {@this.Example}"
}
yields
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "want",
"start_offset": 2,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "some",
"start_offset": 7,
"end_offset": 11,
"type": "word",
"position": 2
},
{
"token": "cool",
"start_offset": 12,
"end_offset": 16,
"type": "word",
"position": 3
},
{
"token": "{@this.example}",
"start_offset": 17,
"end_offset": 32,
"type": "word",
"position": 4
}
]
}
and _validate/query
says everything is fine
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
}
}
Any ideas on how I can achieve this behavior? I'm sure I must use an analyzer for this task to make sure elastic indexes the annotation as is.
Even if I query only for {@this.Example}
I get no results, even if the analyzer is doing what I expect it to do, the search query is not hitting this token