Help with ASCIIfolding


(Basiclaser) #1

I'm attempting to add an 'asciifolding' field to my title field, and have tried various examples, signatures and syntaxes, but they all seem to fail. Is asciifolding deprecated or something? thanks.

PUT /my_index
{
    "settings": { "number_of_shards": 1 }, 
    "mappings": {
        "my_type": {
            "properties": {
                "title": { 
                    "type":     "string",
                    "analyzer": "english",
                    "fields": {
                        "std":   { 
                            "type":     "string",
                            "analyzer": "standard"
                        },                        
                        "fold":   { 
                          "type":     "string",
                          "tokenizer": "standard",
                          "filter":  [ "lowercase", "asciifolding" ]
                        }
                    }
                }
            }
        }
    }
}

error message:

{
   "error": {
      "root_cause": [
         {
            "type": "mapper_parsing_exception",
            "reason": "Mapping definition for [fields] has unsupported parameters:  [filter : [lowercase, asciifolding]] [tokenizer : standard]"
         }
      ],
      "type": "mapper_parsing_exception",
      "reason": "Failed to parse mapping [my_type]: Mapping definition for [fields] has unsupported parameters:  [filter : [lowercase, asciifolding]] [tokenizer : standard]",
      "caused_by": {
         "type": "mapper_parsing_exception",
         "reason": "Mapping definition for [fields] has unsupported parameters:  [filter : [lowercase, asciifolding]] [tokenizer : standard]"
      }
   },
   "status": 400
}

(Basiclaser) #2

Ah! interestingly there seems to be no issue in this case where the 'mappings' wrapping field is omitted:

PUT /my_index/
{
  "my_type": {
    "properties": {
      "title": { 
        "type":           "string",
        "analyzer":       "english",
        "fields": {
          "folded": { 
            "type":       "string",
            "analyzer":   "asciifolding"
          },
          "std":   { 
              "type":     "string",
              "analyzer": "standard"
          }
        }
      }
    }
  }
}

(Basiclaser) #3

I'm still having issues applying asciifolding to types.
take the following simple code:

DELETE /my_index/
PUT /my_index/
PUT /my_index/_mapping/my_type
{
  "settings": { "number_of_shards": 1 }, 
  "properties": {
    "title": { 
      "type":           "string",
      "analyzer":       "standard",
      "fields": {
        "folded": { 
          "type":       "string",
          "analyzer":   "asciifolding"
        },
        "std":   { 
              "type":     "string",
              "analyzer": "standard"
        }
      }
    }
  }
}

PUT /my_index/my_type/5
{ "title": "Esta loca!" }
PUT /my_index/my_type/6
{ "title": "Está loca!" }

GET /my_index/_validate/query?explain
{
  "query": {
    "multi_match": {
      "type":     "most_fields",
      "query":    "está loca",
      "fields": [ "title", "title.folded" ]
    }
  }
}

Is there a reason i can't apply asciifolding to a field? here is the error ( it failed on line 3 ):

{
   "error": {
      "root_cause": [
         {
            "type": "mapper_parsing_exception",
            "reason": "analyzer [asciifolding] not found for field [folded]"
         }
      ],
      "type": "mapper_parsing_exception",
      "reason": "analyzer [asciifolding] not found for field [folded]"
   },
   "status": 400
}

(David Pilato) #4

asciifolding is not an analyzer but a token filter.

What you want to do is something like this:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [ "lowercase", "asciifolding" ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "foo": {
          "type": "string",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

(Basiclaser) #5

Hi there Dadoonet,
Thanks for your help. It got me going in the right direction, and now I'm at the point where things are successfully indexing, but the the asciifolding does not seem to be taking effect. If you wouldn't mind, please take a look at my settings, mapping and query.

      {
        settings: {
          analysis: {
            filter: {
              min_eng: {
                type: 'stemmer',
                name: 'english',
                // name: 'minimal_english',
              },
            },
            analyzer: {
              sensible_analyzer: {
                tokenizer: 'standard',
                filter: [
                  'min_eng',
                  'asciifolding',
                  'lowercase',
                ],
              },
            },
          },
        },
      }

.. mapping:

      type: String,
      required: true,
      es_indexed: true,
      analyzer: 'sensible_analyzer',
      search_analyzer: 'sensible_analyzer',
    },
```
and finally my simple query: 
  multi_match: {
    query: this.query.q,
    fields: ['title', 'authors.name'],
    cutoff_frequency: 0.0007, // https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html#_examples_3
  },

And so.. If I query for 'andré' with this setup, I receive results containing 'André', but if I omit the 'é' and just search 'andre' or 'Andre', I get no results. 
Any ideas? Thanks a lot.

(system) #6