Custom Analyzer doesn't work

Jorge · October 25, 2016, 4:22pm

Hellos Gays, I'm used elastic 2.3.1 with lucene 5.5.0, this is the issue, I created a custom analyzer when I test work fine, but when indexing doesn't work.

PUT test
{

"analysis": {
  "analyzer": {
    "myanalyzer": {
      "type" : "custom",
      "tokenizer": "standard",
      "char_filter": ["mycharfilter"]
    }
  },
  "char_filter": {
    "mycharfilter": {
      "type": "pattern_replace",
      "pattern": "(\\d{4})(\\d{4})(\\d{4})(\\d{4})",
      "replacement": "$1$2xxx$4"
    }
  }
}

}

PUT /test/_mapping/test
{

"test" : {
  "properties" : {
    "texto" : {
      "type": "string",
      "analyzer": "myanalyzer"
    }
  }
}

}

GET test/_mapping
{
"test": {
"mappings": {
"test": {
"properties": {
"texto": {
"type": "string",
"analyzer": "myanalyzer"
}
}
}
}
}
}

Look nice !!

GET /test/_analyze?analyzer=myanalyzer&text="1236852499998521"
{
"tokens": [
{
"token": "12368524xxx8521",
"start_offset": 1,
"end_offset": 17,
"type": "",
"position": 0
}
]
}

PUT /test/test/1
{
"texto": "1234567812345678"
}

Doesn't work

GET /test/_search?pretty

"hits": [
  {
    "_index": "test",
    "_type": "test",
    "_id": "1",
    "_score": 1,
    "_source": {
      "texto": "1234567812345678"
    }
  }
]

}
}

What is wrong ?
Thx in advance for you help

johtani · October 26, 2016, 7:38am

Hi @Jorge ,

_source is an original JSON and does not represent analyzed string.

You can see the defined analyzer behavior by using _analyze API with field param instead of analyzer param.

Example :

GET /test/_analyze?field=texto&text="1236852499998521"

Jorge · October 26, 2016, 3:34pm

Thanks for the reply. I know what you tell me. But then when the field is indexed, then it is not saved with the format analyzer?

As I can do to be indexed with the format analyzer? and when you see it displayed in the appropriate format.

Thx

johtani · October 26, 2016, 5:45pm

You already are indexed with the format analyzer.
Elasticsearch does not respond indexed string data.

See : https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html

And also see: https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html#_when_analyzers_are_used

Jorge · October 26, 2016, 6:23pm

Thanks Johtani, but I don't understand then. 2nd link says: The token is the actual term that will be stored in the index.

when I test

GET /test/_analyze?analyzer=myanalyzer&text="1236852499998521", this is the result:

{
"tokens": [
{
"token": "12368524xxx8521", <--- this is token (Good!!!!)
"start_offset": 1,
"end_offset": 17,
"type": "",
"position": 0
}
]
}

but if indexing

PUT /test/test/1
{
"texto": "1234567812345678"
}

when I querying

GET /test/_search?pretty

why display the field texto not formatted

"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"texto": "1234567812345678" <-- not formatted ( )
}

Sorry if my question is fool ... but what happened ? what am I doing wrong ?

johtani · October 27, 2016, 2:09am

Analyzer is not formatter.
_source in _search response is shown original JSON you indexed.

_analyze API show you only how analyzer tokenize text.
Elasticsearch uses only each terms that tokenized by analyzer as inverted index's word.
And Elasticsearch stores input JSON as _source
But elasticsearch does not analyze _source data.

If you want to change original texto to formatted texto in _source, you should format before indexing elasticsearch or use transform feature

Topic		Replies	Views
Search analyzer not work Elasticsearch	7	1445	August 2, 2019
Custom analyzer not applied on property in query Elasticsearch	6	458	July 6, 2017
Search query doesn't use custom analyzer Elasticsearch	5	2328	July 5, 2017
Help - custom anaylzer almost works but not getting the results I want when searching _all Elasticsearch	3	360	July 6, 2017
Custom tokenizer doesn't work on reindex/index api, only _analyze endpoint Elasticsearch	8	2497	October 24, 2017

Custom Analyzer doesn't work

Related topics