Problem with synonym search


(clotildeh) #1

Hello,

I defined synonyms in my txt file : canapé, banquette, poof
When I launch this command : curl -XGET '
http://127.0.0.1:9200/my_index/_analyze?pretty=1&text=poof&analyzer=default',
i have my synonyms :
{
"tokens" : [ {
"token" : "canapé",
"start_offset" : 0,
"end_offset" : 4,
"type" : "SYNONYM",
"position" : 1
}, {
"token" : "banquette",
"start_offset" : 0,
"end_offset" : 4,
"type" : "SYNONYM",
"position" : 1
}, {
"token" : "poof",
"start_offset" : 0,
"end_offset" : 4,
"type" : "SYNONYM",
"position" : 1
} ]
}

But, when I do a research, I only have the documents that contain the word
"canapé". I don't have the documents that contain the words "poof" and
"banquette" (3 hits over 5)

curl -XGET 'http://127.0.0.1:9200/testavecparam/products/_search?pretty=1'
-d '
{
"query" : {
"text" : {
"products.designation" : {
"query" : "canape",
"analyzer" : "default"
}
}
}

}
'

My configuration :
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "custom",
"tokenizer" : "standard"
"filter" : [
"standard",
"lowercase",
"myStemmer",
"asciifolding",
"mySynonym"
],

        }
     }
     "filter" : {
        "myStemmer" : {
           "type" : "stemmer",
           "language" : "light_french"
        }
        "mySynonym" : {
           "type" : "synonym",
           "synonyms_path" : "analysis/synonym.txt"
        }
     },
  }

}

Is there any problem with my configuration?

Thanks for your answers


(Lukáš Vlček) #2

Hi,

would you mind preparing complete curl recreation?
http://www.elasticsearch.org/help/ (note you can use gist instead of
pasting it into mali body, it is more practical)
This typically helps a lot.

Regards,
Lukas

On Wed, Jul 4, 2012 at 3:38 PM, clotildeh c.hebert@novactive.com wrote:

Hello,

I defined synonyms in my txt file : canapé, banquette, poof
When I launch this command : curl -XGET 'http://127.0.0.1:9200/my_**
index/_analyze?pretty=1&text=**poof&analyzer=defaulthttp://127.0.0.1:9200/my_index/_analyze?pretty=1&text=poof&analyzer=default',
i have my synonyms :
{
"tokens" : [ {
"token" : "canapé",
"start_offset" : 0,
"end_offset" : 4,
"type" : "SYNONYM",
"position" : 1
}, {
"token" : "banquette",
"start_offset" : 0,
"end_offset" : 4,
"type" : "SYNONYM",
"position" : 1
}, {
"token" : "poof",
"start_offset" : 0,
"end_offset" : 4,
"type" : "SYNONYM",
"position" : 1
} ]
}

But, when I do a research, I only have the documents that contain the word
"canapé". I don't have the documents that contain the words "poof" and
"banquette" (3 hits over 5)

curl -XGET 'http://127.0.0.1:9200/testavecparam/products/_
search?pretty=1http://127.0.0.1:9200/testavecparam/products/_search?pretty=1'
-d '
{
"query" : {
"text" : {
"products.designation" : {
"query" : "canape",
"analyzer" : "default"
}
}
}

}
'

My configuration :
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"type" : "custom",
"tokenizer" : "standard"
"filter" : [
"standard",
"lowercase",
"myStemmer",
"asciifolding",
"mySynonym"
],

        }
     }
     "filter" : {
        "myStemmer" : {
           "type" : "stemmer",
           "language" : "light_french"
        }
        "mySynonym" : {
           "type" : "synonym",
           "synonyms_path" : "analysis/synonym.txt"
        }
     },
  }

}

Is there any problem with my configuration?

Thanks for your answers


(Andreas W) #3

Hi clotildeh,

One challenge with synonyms is how to combine it with the stemming filter.

If you first apply the stemming filter and the synonym filter afterwards,
you need to have the stemmed version of the word in your synonym.txt.

I did it vice versa: I first apply the synonym filter and afterwards the
stemming filter. This implies that I have to list all variants of a word in
synonym.txt. (Which obviously is easier for English than for French.)

Hope this helps,
Andreas


(clotildeh) #4

Hi Andreas,

Thank you for your help.
Indeed, we found that putting the synonym filter before the stemming filter
was giving better results.

We still need to get how the filters intercacted to make sure that we match
expected document.
Thank you for your answer

Le jeudi 5 juillet 2012 11:52:28 UTC+2, Andreas W a écrit :

Hi clotildeh,

One challenge with synonyms is how to combine it with the stemming filter.

If you first apply the stemming filter and the synonym filter afterwards,
you need to have the stemmed version of the word in your synonym.txt.

I did it vice versa: I first apply the synonym filter and afterwards the
stemming filter. This implies that I have to list all variants of a word in
synonym.txt. (Which obviously is easier for English than for French.)

Hope this helps,
Andreas


(system) #5