Synonym using a file is not working: malformed_input_exception


(valerio) #1

Hi i m trying to use synonym file in ES 2.3.4.

from the documentation i used the following command in SENSE:

POST /mise/
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "sinonimi/synonym.txt"
}
}
}
}
}

I want all thje synonyms to be equivalent. Below my synonym.txt file format:

"abate,priore,superiore",
"abbacchiare,avvilire,deprimere",
"abbacchiarsi,abbattersi,abbiosciarsi,accasciarsi,avvilirsi,deprimersi,disperarsi,scoraggiarsi,sgomentarsi",
"abbacchiato,abbattuto,accasciato,afflitto,affranto,annientato,costernato,demoralizzato,depresso,in crisi,infelice,malinconico,mogio,prostrato,sconfortato,scoraggiato,scorato,sfiduciato,triste",
"abbacchio,agnello"

I get the following error message:

{
"error": {
"root_cause": [
{
"type": "index_creation_exception",
"reason": "failed to create index"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "malformed_input_exception",
"reason": "Input length = 1"
}
},
"status": 400
}

Could you please help me out?

Cannot get around this issue.

Thanx valerio


(David Pilato) #2

You did not read the doc?

https://www.elastic.co/guide/en/elasticsearch/reference/2.4/analysis-synonym-tokenfilter.html#_solr_synonyms


(valerio) #3

Yes i did it!

It seems the error is related to the synonim itself.
I meas if use 2 words for a synonym it doesnt work.

For example the synonym row:

"abbattere,ammainare,peggiorare,tirare giu"

the term "tirare giu" causes the error. IT wants just one token.

Is that right?

valerio


(David Pilato) #4

Try:

abbattere, ammainare, peggiorare, tirare giu

Or:

abbattere, ammainare, peggiorare, tirare giu => abbattere

(valerio) #5

Hi David and thanx.

I must correct myself. Acutally it's not the 2 tokens synonym to create the error but the diacritic.

the original row was:

abbattere, ammainare, peggiorare, tirare giù => this causes the malformed error because of the ù

If ireplace ù with u' it works!

abbattere, ammainare, peggiorare, tirare giu' => OK

Therefore i think i'm going to replace all the diacritics

Thanx valerio


Synonym ÅÄÖ exception
(valerio) #6

I m wondering whether it is possible defining an ascii_folding filter(to remove diacritics )for the synonyms filter?

Thanx valerio


(David Pilato) #7

You can apply the asciifolding token filter before the synonym filter in the filter chain when you define an analyzer.
So tirare giù will become tirare giu before reaching the synonym token filter.


(system) #8