Custom Analyzer doesn't work


(Jorge Ramirez) #1

Hellos Gays, I'm used elastic 2.3.1 with lucene 5.5.0, this is the issue, I created a custom analyzer when I test work fine, but when indexing doesn't work.

PUT test
{

"analysis": {
  "analyzer": {
    "myanalyzer": {
      "type" : "custom",
      "tokenizer": "standard",
      "char_filter": ["mycharfilter"]
    }
  },
  "char_filter": {
    "mycharfilter": {
      "type": "pattern_replace",
      "pattern": "(\\d{4})(\\d{4})(\\d{4})(\\d{4})",
      "replacement": "$1$2xxx$4"
    }
  }
}

}

PUT /test/_mapping/test
{

"test" : {
  "properties" : {
    "texto" : {
      "type": "string",
      "analyzer": "myanalyzer"
    }
  }
}

}

GET test/_mapping
{
"test": {
"mappings": {
"test": {
"properties": {
"texto": {
"type": "string",
"analyzer": "myanalyzer"
}
}
}
}
}
}

Look nice !! :smile:

GET /test/_analyze?analyzer=myanalyzer&text="1236852499998521"
{
"tokens": [
{
"token": "12368524xxx8521",
"start_offset": 1,
"end_offset": 17,
"type": "",
"position": 0
}
]
}

PUT /test/test/1
{
"texto": "1234567812345678"
}

Doesn't work :sob:

GET /test/_search?pretty

"hits": [
  {
    "_index": "test",
    "_type": "test",
    "_id": "1",
    "_score": 1,
    "_source": {
      "texto": "1234567812345678"
    }
  }
]

}
}

What is wrong ?
Thx in advance for you help


(Jun Ohtani) #2

Hi @Jorge ,

_source is an original JSON and does not represent analyzed string.

You can see the defined analyzer behavior by using _analyze API with field param instead of analyzer param.

Example :

GET /test/_analyze?field=texto&text="1236852499998521"

(Jorge Ramirez) #3

Thanks for the reply. I know what you tell me. But then when the field is indexed, then it is not saved with the format analyzer?

As I can do to be indexed with the format analyzer? and when you see it displayed in the appropriate format.

Thx


(Jun Ohtani) #4

You already are indexed with the format analyzer.
Elasticsearch does not respond indexed string data.

See : https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html

And also see: https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html#_when_analyzers_are_used


(Jorge Ramirez) #5

Thanks Johtani, but I don't understand then. 2nd link says: The token is the actual term that will be stored in the index.

when I test

GET /test/_analyze?analyzer=myanalyzer&text="1236852499998521", this is the result:

{
"tokens": [
{
"token": "12368524xxx8521", <--- this is token (Good!!!!)
"start_offset": 1,
"end_offset": 17,
"type": "",
"position": 0
}
]
}

but if indexing

PUT /test/test/1
{
"texto": "1234567812345678"
}

when I querying

GET /test/_search?pretty

why display the field texto not formatted

"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"texto": "1234567812345678" <-- not formatted ( :weary:)
}

Sorry if my question is fool ... but what happened ? what am I doing wrong ?


(Jun Ohtani) #6

Analyzer is not formatter.
_source in _search response is shown original JSON you indexed.

_analyze API show you only how analyzer tokenize text.
Elasticsearch uses only each terms that tokenized by analyzer as inverted index's word.
And Elasticsearch stores input JSON as _source
But elasticsearch does not analyze _source data.

If you want to change original texto to formatted texto in _source, you should format before indexing elasticsearch or use transform feature


(system) #7