I am using Elasticsearch Version 7.1.1
Given a search object like this:
{
"query": {
"bool": {
"must": {
{"query_string": {"query": "bar", "default_field": "text"},
{"query_string": {"query": "foo", "default_field": "text"},
}
}
},
"highlight": {
"type": "fvh",
"pre_tags": ["<em0>", "<em1>"],
"post_tags": ["</em0>", "</em1>"],
"fields": {"text": {}},
}
}
And a document like this:
{
"text": "The only foo bar here is me",
}
I get a response with a highlight like this:
"The only <em1>foo</em1> <em0>bar</em0> here is me"
This example shows that the order of pre/post tags depends on the order of conditions in the query object. This is great. Clearly there are some rules here, but I can't find any reference to this logic in the docs. I have uncovered many more rules like this which enabled me to hack together a parser that can read in the query object and the pre/post tags and return a map between them. I am, however, concerned that these rules may change during an update. I can elaborate on these rules if that would be helpful.
Does anyone know if these rules are explicitly written anywhere?
On a related note, it would be great if we could pass an id into the query condition, eg:
{"query_string": {"query": "bar", "default_field": "text", "tag_id": 0}}
Then the tag_id
could map to the index of the related pre/post tag that should be used to highlight the matching tokens. I do understand that it's not that simple, as a query string might be composed of several tokens, some of which might be exact phrases or span conditions.