HI ! I indexed a document which contain the words "contrat à durée déterminée". What I want is that all documents containing that exact words match when I search for the acronym "cdd".
Here is my index settings :
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'custom',
'tokenizer' => 'nGram',
'filter' => array('asciifolding', 'lowercase', 'synonym', 'snowball', 'elision', 'worddelimiter', 'stopwords'),
),
'searchAnalyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array('asciifolding', 'lowercase', 'elision', 'worddelimiter', 'synonym', 'stopwords'),
),
...
),
'tokenizer' => array(
'nGram' => array(
'type' => 'nGram',
'min_gram' => 3,
'max_gram' => 20,
'token_chars' => array('letter', 'digit'),
),
),
'filter' => array(
'synonym' => array(
'tokenizer' => 'keyword',
'type' => 'synonym',
'synonyms_path' => sfConfig::get('app_elasticsearch_path_synonym'),
'ignore_case' => true,
),
...
),
),
The synonyms file contain a line with "CDD,Contrat à Durée Déterminée".
And here, a part of my index mapping :
"idea": {
"properties": {
"initial_situation": {
"properties": {
"search": {
"type": "string",
"analyzer": "searchAnalyzer",
"include_in_all": true
}
...
}
},
"proposed_solution": {
"properties": {
"search": {
"type": "string",
"analyzer": "searchAnalyzer",
"include_in_all": true
}
...
}
},
...
}
}
A document sample :
{
"_index": "clic",
"_type": "idea",
"_id": "3863",
"_score": 0.030160192,
"_source": {
"id": "3863",
"title": {
"name": "Lorem ipsum ...",
...
},
"proposed_solution": {
"search": "Lorem ... contrat à durée déterminée, Lorem ...",
...
},
...
}
}
When I use the analyze API like this : GET /clic/_analyze?analyzer=searchAnalyzer&text=cdd
It output the correct synonyms :
{
"tokens": [
{
"token": "cdd",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "contrat à durée déterminée",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
}
]
}
So far, it seems to be correct for me. In addition, when I use the validate API for explaining my query like this :
GET clic/idea/_validate/query?explain
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "cdd",
"type": "cross_fields",
"fields": [
"title.name^3",
"initial_situation.search^3",
"proposed_solution.search^3",
"expected_benefits.search^3"
],
"operator": "and",
"analyzer": "searchAnalyzer"
}
}
]
}
}
}
}
}
It output :
"explanations": [
{
"index": "clic",
"valid": true,
"explanation": "filtered((
blended(terms: [proposed_solution.search:cdd, title.name:cdd, expected_benefits.search:cdd, initial_situation.search:cdd])
blended(terms: [proposed_solution.search:contrat à durée déterminée, title.name:contrat à durée déterminée, expected_benefits.search:contrat à durée déterminée, initial_situation.search:contrat à durée déterminée])
))
->cache(_type:idea)"
}
]
From what I understood, ES search both "cdd" and "contrat à durée déterminée" in all fiels that I mentionned. Thus, it should find document containing "cdd" or "contrat à durée déterminée". But it's not the case. When I do a post search with the same query, it hits 0 result.
I hope I was clear in my explanations. Any help will be appreciated Thanks !