I though default analyzer is "standard" analyzer, but per my following experimentation, seems not.
- Create index with customized standard analyzer which included a pattern_capture filter to split words by "." or "_"
POST / myindex
{
"settings" : {
"analysis" : {
"filter" : {
"customsplit" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^_.]+)"
]
}
},
"analyzer" : {
"standard" : {
"tokenizer" : "standard",
"filter" : [
"lowercase",
"customsplit"
]
}
}
}
},
"mappings" : {
"docs" : {
"properties" : {
"Url" : {
"type" : "string"
}
}
}
}
}
- Insert one doc to myindex
POST /myindex/docs/1
{
"Url": "www.xyz.com"
}
Per _analyze API, the standard analyzer used by myindex DOES split the word by "."
GET /myindex/_analyze?analyzer=standard&text=www.xyz.com
output:
{
"tokens": [
{
"token": "www.xyz.com",
"start_offset": 0,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "www",
"start_offset": 0,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "xyz",
"start_offset": 0,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "com",
"start_offset": 0,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
}
]
}
BUT, the problem is, if I search "xyz" from myindex, nothing returned:
POST /myindex/_search
{
"query": {
"match": {
"Url": "xyz"
}
}
}
BUT, if I explicitly set the analyzer to "standard" in index mapping:
mappings": {
"docs": {
"properties": {
"Url": {
"type": "string",
"analyzer": "standard"
}
}
}
Then searching "xyz" can return the documents.
SO my question is: Is "standard" really default analyzer of ES index? if NOT, how to set default analyzer?
Or anything wrong in my above testing steps, if standard is indeed the default analyzer?