Custom Analyzer doesn't work

Hellos Gays, I'm used elastic 2.3.1 with lucene 5.5.0, this is the issue, I created a custom analyzer when I test work fine, but when indexing doesn't work.

PUT test
{

"analysis": {
  "analyzer": {
    "myanalyzer": {
      "type" : "custom",
      "tokenizer": "standard",
      "char_filter": ["mycharfilter"]
    }
  },
  "char_filter": {
    "mycharfilter": {
      "type": "pattern_replace",
      "pattern": "(\\d{4})(\\d{4})(\\d{4})(\\d{4})",
      "replacement": "$1$2xxx$4"
    }
  }
}

}

PUT /test/_mapping/test
{

"test" : {
  "properties" : {
    "texto" : {
      "type": "string",
      "analyzer": "myanalyzer"
    }
  }
}

}

GET test/_mapping
{
"test": {
"mappings": {
"test": {
"properties": {
"texto": {
"type": "string",
"analyzer": "myanalyzer"
}
}
}
}
}
}

Look nice !! :smile:

GET /test/_analyze?analyzer=myanalyzer&text="1236852499998521"
{
"tokens": [
{
"token": "12368524xxx8521",
"start_offset": 1,
"end_offset": 17,
"type": "",
"position": 0
}
]
}

PUT /test/test/1
{
"texto": "1234567812345678"
}

Doesn't work :sob:

GET /test/_search?pretty

"hits": [
  {
    "_index": "test",
    "_type": "test",
    "_id": "1",
    "_score": 1,
    "_source": {
      "texto": "1234567812345678"
    }
  }
]

}
}

What is wrong ?
Thx in advance for you help

Hi @Jorge ,

_source is an original JSON and does not represent analyzed string.

You can see the defined analyzer behavior by using _analyze API with field param instead of analyzer param.

Example :

GET /test/_analyze?field=texto&text="1236852499998521"

Thanks for the reply. I know what you tell me. But then when the field is indexed, then it is not saved with the format analyzer?

As I can do to be indexed with the format analyzer? and when you see it displayed in the appropriate format.

Thx

You already are indexed with the format analyzer.
Elasticsearch does not respond indexed string data.

See : https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html

And also see: https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html#_when_analyzers_are_used

Thanks Johtani, but I don't understand then. 2nd link says: The token is the actual term that will be stored in the index.

when I test

GET /test/_analyze?analyzer=myanalyzer&text="1236852499998521", this is the result:

{
"tokens": [
{
"token": "12368524xxx8521", <--- this is token (Good!!!!)
"start_offset": 1,
"end_offset": 17,
"type": "",
"position": 0
}
]
}

but if indexing

PUT /test/test/1
{
"texto": "1234567812345678"
}

when I querying

GET /test/_search?pretty

why display the field texto not formatted

"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"texto": "1234567812345678" <-- not formatted ( :weary:)
}

Sorry if my question is fool ... but what happened ? what am I doing wrong ?

Analyzer is not formatter.
_source in _search response is shown original JSON you indexed.

_analyze API show you only how analyzer tokenize text.
Elasticsearch uses only each terms that tokenized by analyzer as inverted index's word.
And Elasticsearch stores input JSON as _source
But elasticsearch does not analyze _source data.

If you want to change original texto to formatted texto in _source, you should format before indexing elasticsearch or use transform feature