Custom analyzer with token replacement

2021-11-14T23:00:00Z

Hi to everyone,

I was playing with custom analyzers and I had trouble with the following example.

Basically, I want to index a document with a field named "user_opinion" in two ways:

  • with the english analyzer
  • with a custom analyzer (agnostic_analyzer), that replace some specific words ((christianity)) with a custom token (<religion>)

I think the index I have created is correct because the termvectors shows that the token is actually there, but the score of the search system isn't showing any evidence of the token usage:
in the last query, we receive the same score with and without the token <religion> in the query.

Because I thought there is a problem with the analyzer used at search time, I have put the field: "search_analyzer": "agnostic_analyzer" also.

Any suggestion why the final two queries are returning the same score and are insensitive to the token word?

Many thanks!

# ---
# Custom analyzers
# ---

PUT test-index-03
{
  "settings": {
    "analysis": {
      "analyzer": {
        "agnostic_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "agnostic_filter"
          ]
        }
      },
      "filter": {
        "agnostic_filter": {
          "type": "pattern_replace",
          "pattern": "(christianity)",
          "replacement": "<religion>"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "user_id": {
        "type": "keyword"
      },
      "user_opinion": {
        "type": "text",
        "analyzer": "english",
        "term_vector": "with_positions_offsets_payloads",
        "store": true,
        "fields": {
          "agnostic": {
            "type": "text",
            "analyzer": "agnostic_analyzer",
            "search_analyzer": "agnostic_analyzer",
            "term_vector": "with_positions_offsets_payloads",
            "store": true
          }
        }
      }
    }
  }
}
# > 200

PUT test-index-03/_doc/01
{
  "user_id": "A001",
  "user_opinion": "I have a long family tradition around christianity and their celebrations"
}
# > 200

GET test-index-03/_search
{
  "query": {
    "match": {
      "user_opinion": "christianity tradition"
    }
  }
}
# > 0.575


GET test-index-03/_search
{
  "query": {
    "match": {
      "user_opinion": "buddhist tradition"
    }
  }
}
# > 0.28 score

GET test-index-03/_termvectors/01
# > "<religion>" is present with  "term_freq" : 1

GET test-index-03/_search
{
  "query": {
    "match": {
      "user_opinion.agnostic": "<religion> tradition"
    }
  }
}
# > 0.28 score

GET test-index-03/_search
{
  "query": {
    "match": {
      "user_opinion.agnostic": "tradition"
    }
  }
}
# > 0.28 score, like with <religion> tag

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.