I am using ingest processor to ingest/index files like pdf, xls etc.
This is my pipeline:
curl -XPUT 'ES_HOST:ES_PORT/_ingest/pipeline/attachment?pretty' -H 'Content-Type: application/json' -d '{
"description" : "Extract attachment information encoded in Base64 with UTF-8 charset",
"processors" : [
{
"attachment" : {
"field" : "data",
"properties": [ "content", "content_type" ]
}
},
{"remove": {
"field": "data"
}
}
]
}
Then I define the doc_poc_syn index and add a custom analyzer to try the synonym feature:
PUT /doc_poc_syn
{
"mappings": {
"properties": {
"attachment.content": {
"type": "text"
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": ["my_synonym" ]
}
},
"search_analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": ["my_synonym" ]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms": [
"pandemic, covid"
]
}
}
}
}
}
}
(The search analyzer should be redundant if I am not wrong)
POST doc_poc_syn/_analyze
{
"analyzer": "my_analyzer",
"text": "pandemic"
}
returns as expected but if I run
GET /_search
{
"query": {
"match": {
"attachment.content": {
"query": "pandemic"
}
}
}}
I don't get any results. Instead If I use "covid" a document that added comes up. Any clue?
Thanks!