Elasticsearch custom analyzer not working

Hi,

thanks for the question, I think your problem might have to do with a slight misunderstanding regarding the pattern analyzer you use in the example. Note that the pattern parameter specifies the regex for splitting tokens (docs).

In your example your specify a pattern that matches whole lines, but the address strings in the indexed document are not real lines (no line break at the end), so not even one token is produced.

Note that in your analyzer test example you have line breaks:

curl -XGET 'localhost:9200/test/_analyze?analyzer=test_lowercase&pretty' -d '
Beijing China
'
{
  "tokens" : [ {
    "token" : "\nbeijing china\n",
    "start_offset" : 0,
    "end_offset" : 15,
    "type" : "word",
    "position" : 0
  } ]
}

You can see here that the token contains the line breaks you entered in your request on the command line.

If however you do the request all in one line, you can see there are no tokens produced (there are not line breaks matching your token separation pattern):

curl -XGET 'localhost:9200/test/_analyze?analyzer=test_lowercase&pretty' -d 'Bejing China'
{
  "tokens" : [ ]
}

This however is what the analyzer sees when you index document fields.

If instead of using the pattern analyzer, you use a custom analyzer with a keyword tokenizer followed by a lowercase filter, the example doc you indexed gets returned by your search:

{
    "analysis": {
      "analyzer": {
        "test_lowercase": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": ["lowercase"]
        }
      }
    }
}