What is default index analyzer?

I though default analyzer is "standard" analyzer, but per my following experimentation, seems not.

  1. Create index with customized standard analyzer which included a pattern_capture filter to split words by "." or "_"
POST / myindex
{
      "settings" : {
        "analysis" : {
          "filter" : {
            "customsplit" : {
              "type" : "pattern_capture",
              "preserve_original" : 1,
              "patterns" : [
                "([^_.]+)"
              ]
            }
          },
          "analyzer" : {
            "standard" : {
              "tokenizer" : "standard",
              "filter" : [
                "lowercase",
                "customsplit"
              ]
            }
          }
        }
      },
      "mappings" : {
        "docs" : {
          "properties" : {
            "Url" : {
              "type" : "string"
            }
          }
        }
      }
    }
  1. Insert one doc to myindex
POST /myindex/docs/1
{
	"Url": "www.xyz.com"
}

Per _analyze API, the standard analyzer used by myindex DOES split the word by "."

GET /myindex/_analyze?analyzer=standard&text=www.xyz.com

output:

{
      "tokens": [
        {
          "token": "www.xyz.com",
          "start_offset": 0,
          "end_offset": 11,
          "type": "<ALPHANUM>",
          "position": 1
        },
        {
          "token": "www",
          "start_offset": 0,
          "end_offset": 11,
          "type": "<ALPHANUM>",
          "position": 1
        },
        {
          "token": "xyz",
          "start_offset": 0,
          "end_offset": 11,
          "type": "<ALPHANUM>",
          "position": 1
        },
        {
          "token": "com",
          "start_offset": 0,
          "end_offset": 11,
          "type": "<ALPHANUM>",
          "position": 1
        }
      ]
    }

BUT, the problem is, if I search "xyz" from myindex, nothing returned:

POST /myindex/_search
{
 "query": {
  "match": {
    "Url": "xyz"
  }
 }
}

BUT, if I explicitly set the analyzer to "standard" in index mapping:

mappings": {
  "docs": {
    "properties": {
      "Url": {
        "type": "string",
        "analyzer": "standard"
      }
    }
  }

Then searching "xyz" can return the documents.

SO my question is: Is "standard" really default analyzer of ES index? if NOT, how to set default analyzer?

Or anything wrong in my above testing steps, if standard is indeed the default analyzer?

Please format your code using </> icon as explained in this guide. It will make your post more readable.

The standard analyzer is explained here: https://www.elastic.co/guide/en/elasticsearch/reference/5.1/analysis-standard-analyzer.html

As per https://www.elastic.co/guide/en/elasticsearch/reference/5.1/analysis.html#_specifying_an_index_time_analyzer, the default analyzer is the standard one.

Thanks David.
Yes, per ES reference, "standard" analyzer should be default analyzer.

But then, anything wrong with my testing? Does the default "standard" used just mean the built-in standard analyzer, but not the customized "standard" as defined in my setting?

That's interesting.

Indeed, you can't here "overwrite" the standard analyzer which is built in elasticsearch.

The proper way to solve your issue for now is to do something like:

DELETE myindex
PUT myindex
{
  "settings": {
    "analysis": {
      "filter": {
        "customsplit": {
          "type": "pattern_capture",
          "preserve_original": 1,
          "patterns": [
            "([^_.]+)"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "customsplit"
          ]
        }
      }
    }
  },
  "mappings": {
    "docs": {
      "properties": {
        "Url": {
          "type": "string",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}
PUT /myindex/docs/1
{
	"Url": "www.xyz.com"
}
GET /myindex/_analyze?analyzer=my_analyzer&text=www.xyz.com
POST /myindex/_search
{
 "query": {
  "match": {
    "Url": "xyz"
  }
 }
}

May be open an issue on github and refer to this ticket? I think that we should either reject that you are using standard as an analyzer name or pick the right one when running _search. Here I think we are using the built-in one at search time instead of the one which is defined within your index.

Thanks for reporting!

Thanks David.
Yes, I can resolve this issue by explicitly setting the "standard" analyzer to "Url" field.

This issue seems to me a bug of Elasticsearch. As you said, ES should either reject "standard" as an analyzer name in customized analyzer setting or pick the right one when running _search.

I will open this issue on GitHub, if not opened yet.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.