Analyzer on indexed documents


(Zahra Aminolroaya) #1

I am new to elastic. I want to test phonetic in my documents. I used the link below to test phonetic:
phonetic

I tested the analyzer and I got the same result. However, in this example, you should give the text field. I want the analyzer automatically be applied in my index documents. What should I do?


(Mark Walkom) #2

Ideally, analysers are applied to documents during indexing, so you should reindex.
Alternatively you can try https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

Also, you linked to 2.1, are you running that older version?


(Zahra Aminolroaya) #3

Thanks mark. I use the last version. Based on the link I provide I search like this:

GET phonetic_sample/_search
{
"query": {
"match": {
"text": {
"query": "BLKS"
}
}
}
}

I expect the elastic return

"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "phonetic_sample",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"text": "Joe Bloggs"
}
}
]
}
but it returns nothing.


(Christian Dahlqvist) #4

Please show your mapping for the index, as a lot will depend on what this looks like.


(Zahra Aminolroaya) #5

{
"mapping": {
"_doc": {
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}


(Christian Dahlqvist) #6

It looks like you are using standard dynamic mappings and have not specified any custom analysers. That means that your query will match tokens using the standard analyzer, and that does not include any phonetic analysis, which is why your query does not return any results.


(Zahra Aminolroaya) #7

but when I use this:

GET phonetic_sample/_analyze { "analyzer": "my_analyzer", "text": "Joe Bloggs" }

I get the below result:

{
"tokens": [
{
"token": "J",
"start_offset": 0,
"end_offset": 3,
"type": "",
"position": 0
},
{
"token": "joe",
"start_offset": 0,
"end_offset": 3,
"type": "",
"position": 0
},
{
"token": "BLKS",
"start_offset": 4,
"end_offset": 10,
"type": "",
"position": 1
},
{
"token": "bloggs",
"start_offset": 4,
"end_offset": 10,
"type": "",
"position": 1
},
{
"token": "1",
"start_offset": 11,
"end_offset": 12,
"type": "",
"position": 2
}
]
}


(Christian Dahlqvist) #8

Yes, but where are you using this analyzer? It is not in your index mapping, so is not applied at index time.


(Zahra Aminolroaya) #9

thanks Christian. I used the link I provided and based on what mark said it should be applied to documents during indexing. I will use his alternative option and tell the result.


(Christian Dahlqvist) #10

Here I suspect you will need to reindex and apply the analyser at index time. You can phonetically get Bloggs tokenised as BLKS, but not the other way around.


(Zahra Aminolroaya) #11

Finally I used the following which solved my problem:

PUT my_index1
{
"settings": {
"analysis": {
"filter": {
"my_phonetic": {
"type": "phonetic"
}
},
"analyzer": {
"my_phonetic": {
"type": "custom",
"tokenizer": "standard",
"filter": ["phonetic"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_phonetic",
"search_analyzer": "my_phonetic"
}
}
}
}
}