Using normalizer for sorting

ArturKLB · November 7, 2017, 2:35pm

Hi. I used custom normalizer with some char_filter and lowercase filter. And when I perfom search with sorting I see that it is actually works! But I don't see any changes in terms via _termvectors query by my normalized field. Why?

jimczi · November 8, 2017, 10:43am

The terms in the response of the term vectors query are created at indexing time with the anlyzer that you set on the field. Can you add an example on the changes that you see (or don't see) in the response ?

ArturKLB · November 8, 2017, 11:44am

Ok. Here is my current setting for the index:

PUT tender-search
{
  "settings": {
    "number_of_replicas": 0,
    "index.mapping.single_type": true,
    "analysis": {
      "char_filter": {
        "garbage_filter": {
          "type": "pattern_replace",
          "pattern": "^([^\\p{L}\\d]+)(.*)",
          "replacement": "$2"
        },
        "ua_sort_filter": {
          "type": "mapping",
          "mappings": [
            "і => ия",
            "І => ИЯ",
            "є => ея",
            "Є => ЕЯ",
            "ґ => гя",
            "Ґ => ГЯ"
          ]
        }
      },
      "analyzer": {
        "default": {
          "type": "keyword"
        },
        "default_search": {
          "type": "standard"
        }
      },
      "normalizer": {
        "title_clean": {
          "type": "custom",
          "char_filter": ["garbage_filter", "ua_sort_filter"],
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "tenderSearch": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "standard",
          "fields": {
            "raw": {
              "type": "keyword",
              "normalizer": "title_clean"
            }
          }
        },
        "description": {
          "type": "text",
          "analyzer": "standard"
        },
        "procuringEntityName": {
          "type": "text",
          "analyzer": "standard",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        },
        "amount": {
          "type": "double"
        },
        "lots": {
          "properties": {
            "amount": {
              "type": "double"
            },
            "title": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

I make _termvectors query for newely created document:

GET tender-search/tenderSearch/d9669b3b22dd4a11ae581b31a5802a13/_termvectors
{
  "fields" : ["title.raw"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}

And I'm getting the response:

{
  "_index": "tender-search",
  "_type": "tenderSearch",
  "_id": "d9669b3b22dd4a11ae581b31a5802a13",
  "_version": 4,
  "found": true,
  "took": 0,
  "term_vectors": {
    "title.raw": {
      "field_statistics": {
        "sum_doc_freq": 30201,
        "doc_count": 30201,
        "sum_ttf": -1
      },
      "terms": {
        "[ТЕСТУВАННЯ] Займати біда з неціновими показниками викот робак.": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 63
            }
          ]
        }
      }
    }
  }
}

But when I test title_clean normalizer with:

GET tender-search/_analyze 
{
  "field": "title.raw", 
  "text":  "[ТЕСТУВАННЯ] Займати біда з неціновими показниками викот робак."
}

I'm getting the response which is that I expected to see in terms above:

{
  "tokens": [
    {
      "token": "тестування] займати бияда з нецияновими показниками викот робак.",
      "start_offset": 0,
      "end_offset": 63,
      "type": "word",
      "position": 0
    }
  ]
}

But as I mentioned sorting behaviour is right. Like with the normalizer modifications.

jimczi · November 8, 2017, 11:57am

Right my bad, I thought that you were using stored term vectors:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html#_example_returning_stored_term_vectors

When we generate the term vectors for a field that do not store them, we re-analyze the field with the analyzer configured in the mapping for this field. Though for keyword field the normalizer is ignored.
I think this can be considered as a bug, can you open an issue on github: https://github.com/elastic/elasticsearch ?

ArturKLB · November 8, 2017, 2:16pm

Done - https://github.com/elastic/elasticsearch/issues/27320

system · December 6, 2017, 2:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Normalizer not working Elasticsearch	2	699	November 6, 2019
Term request with normalized keyword behave strangely Elasticsearch	1	655	July 26, 2017
Should sort missing values be normalized? Elasticsearch	1	588	July 2, 2019
Lowercase normalizer not working Elasticsearch	6	1320	September 15, 2020
Terms aggregation on field with lowercase normalizer Elasticsearch	6	1012	May 5, 2022

Using normalizer for sorting

Related topics