When assigning an analyzer
to an integer
field, the analyzer definition is removed from the mapping.
In the example below I would like to remove non-numeric characters from a house number and store the results as integer
types in order to use a range
filter (eg. so ["1A"]
=> [1]
or ["apt 4"]
=> [4]
)
When posting the mapping to ES, it is removing the analyzer
for the integer
field but not for the string
field.
This results in an unexpected error such as MapperParsingException[failed to parse [myInteger]]; nested: NumberFormatException[For input string: \"apartment 1A\"];
.
I looked through the docs and couldn't find mention of this behaviour.
#!/bin/bash
################################################
# Analyzer unassigned when using 'integer' type
################################################
ES='localhost:9200';
# drop index
curl -XDELETE "$ES/address?pretty=true";
# create index
curl -XPUT "$ES/address?pretty=true" -d'
{
"settings": {
"analysis": {
"analyzer": {
"numberify": {
"type": "custom",
"tokenizer": "standard",
"char_filter": ["convert_non_numeric_chars_to_spaces"]
}
},
"char_filter": {
"convert_non_numeric_chars_to_spaces": {
"type": "pattern_replace",
"pattern": "[^0-9]",
"replacement": " "
}
}
}
},
"mappings": {
"housenumber": {
"properties": {
"myString": {
"type": "string",
"analyzer": "numberify"
},
"myInteger": {
"type": "integer",
"analyzer": "numberify"
}
}
}
}
}';
# retrieve index mapping
curl -XGET "$ES/address/_mapping?pretty=true"
# !!! analyzer has been removed from 'integer' field but not string field !!!
# "myInteger" : {
# "type" : "integer"
# },
# "myString" : {
# "type" : "string",
# "analyzer" : "numberify"
# }
# index a doc
curl -XPOST "$ES/address/housenumber/1?pretty=true" -d'
{
"myInteger": "apartment 1A",
"myString": "apartment 1A"
}';
# {
# "error" : "MapperParsingException[failed to parse [myInteger]]; nested: NumberFormatException[For input string: \"apartment 1A\"]; ",
# "status" : 400
# }
# expected for input 'apartment 1A'
"myInteger": 1,
"myString": "1"
{
"status" : 200,
"name" : "Mahkizmo",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.2",
"build_hash" : "e43676b1385b8125d647f593f7202acbd816e8ec",
"build_timestamp" : "2015-09-14T09:49:53Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}