2021-11-14T23:00:00Z
Hi to everyone,
I was playing with custom analyzers and I had trouble with the following example.
Basically, I want to index a document with a field named "user_opinion" in two ways:
- with the
english
analyzer - with a custom analyzer (
agnostic_analyzer
), that replace some specific words ((christianity)
) with a custom token (<religion>
)
I think the index I have created is correct because the termvectors shows that the token is actually there, but the score of the search system isn't showing any evidence of the token usage:
in the last query, we receive the same score with and without the token <religion>
in the query.
Because I thought there is a problem with the analyzer used at search time, I have put the field: "search_analyzer": "agnostic_analyzer"
also.
Any suggestion why the final two queries are returning the same score and are insensitive to the token word?
Many thanks!
# ---
# Custom analyzers
# ---
PUT test-index-03
{
"settings": {
"analysis": {
"analyzer": {
"agnostic_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"agnostic_filter"
]
}
},
"filter": {
"agnostic_filter": {
"type": "pattern_replace",
"pattern": "(christianity)",
"replacement": "<religion>"
}
}
}
},
"mappings": {
"properties": {
"user_id": {
"type": "keyword"
},
"user_opinion": {
"type": "text",
"analyzer": "english",
"term_vector": "with_positions_offsets_payloads",
"store": true,
"fields": {
"agnostic": {
"type": "text",
"analyzer": "agnostic_analyzer",
"search_analyzer": "agnostic_analyzer",
"term_vector": "with_positions_offsets_payloads",
"store": true
}
}
}
}
}
}
# > 200
PUT test-index-03/_doc/01
{
"user_id": "A001",
"user_opinion": "I have a long family tradition around christianity and their celebrations"
}
# > 200
GET test-index-03/_search
{
"query": {
"match": {
"user_opinion": "christianity tradition"
}
}
}
# > 0.575
GET test-index-03/_search
{
"query": {
"match": {
"user_opinion": "buddhist tradition"
}
}
}
# > 0.28 score
GET test-index-03/_termvectors/01
# > "<religion>" is present with "term_freq" : 1
GET test-index-03/_search
{
"query": {
"match": {
"user_opinion.agnostic": "<religion> tradition"
}
}
}
# > 0.28 score
GET test-index-03/_search
{
"query": {
"match": {
"user_opinion.agnostic": "tradition"
}
}
}
# > 0.28 score, like with <religion> tag