Using normalizer for sorting

Hi. I used custom normalizer with some char_filter and lowercase filter. And when I perfom search with sorting I see that it is actually works! But I don't see any changes in terms via _termvectors query by my normalized field. Why?

The terms in the response of the term vectors query are created at indexing time with the anlyzer that you set on the field. Can you add an example on the changes that you see (or don't see) in the response ?

Ok. Here is my current setting for the index:

PUT tender-search
{
  "settings": {
    "number_of_replicas": 0,
    "index.mapping.single_type": true,
    "analysis": {
      "char_filter": {
        "garbage_filter": {
          "type": "pattern_replace",
          "pattern": "^([^\\p{L}\\d]+)(.*)",
          "replacement": "$2"
        },
        "ua_sort_filter": {
          "type": "mapping",
          "mappings": [
            "і => ия",
            "І => ИЯ",
            "є => ея",
            "Є => ЕЯ",
            "ґ => гя",
            "Ґ => ГЯ"
          ]
        }
      },
      "analyzer": {
        "default": {
          "type": "keyword"
        },
        "default_search": {
          "type": "standard"
        }
      },
      "normalizer": {
        "title_clean": {
          "type": "custom",
          "char_filter": ["garbage_filter", "ua_sort_filter"],
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "tenderSearch": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "standard",
          "fields": {
            "raw": {
              "type": "keyword",
              "normalizer": "title_clean"
            }
          }
        },
        "description": {
          "type": "text",
          "analyzer": "standard"
        },
        "procuringEntityName": {
          "type": "text",
          "analyzer": "standard",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        },
        "amount": {
          "type": "double"
        },
        "lots": {
          "properties": {
            "amount": {
              "type": "double"
            },
            "title": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

I make _termvectors query for newely created document:

GET tender-search/tenderSearch/d9669b3b22dd4a11ae581b31a5802a13/_termvectors
{
  "fields" : ["title.raw"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}

And I'm getting the response:

{
  "_index": "tender-search",
  "_type": "tenderSearch",
  "_id": "d9669b3b22dd4a11ae581b31a5802a13",
  "_version": 4,
  "found": true,
  "took": 0,
  "term_vectors": {
    "title.raw": {
      "field_statistics": {
        "sum_doc_freq": 30201,
        "doc_count": 30201,
        "sum_ttf": -1
      },
      "terms": {
        "[ТЕСТУВАННЯ] Займати біда з неціновими показниками викот робак.": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 63
            }
          ]
        }
      }
    }
  }
}

But when I test title_clean normalizer with:

GET tender-search/_analyze 
{
  "field": "title.raw", 
  "text":  "[ТЕСТУВАННЯ] Займати біда з неціновими показниками викот робак."
}

I'm getting the response which is that I expected to see in terms above:

{
  "tokens": [
    {
      "token": "тестування] займати бияда з нецияновими показниками викот робак.",
      "start_offset": 0,
      "end_offset": 63,
      "type": "word",
      "position": 0
    }
  ]
}

But as I mentioned sorting behaviour is right. Like with the normalizer modifications.

Right my bad, I thought that you were using stored term vectors:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html#_example_returning_stored_term_vectors

When we generate the term vectors for a field that do not store them, we re-analyze the field with the analyzer configured in the mapping for this field. Though for keyword field the normalizer is ignored.
I think this can be considered as a bug, can you open an issue on github: https://github.com/elastic/elasticsearch ?

Done - https://github.com/elastic/elasticsearch/issues/27320

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.