I have seen the tokenizer field inside the synonym field, to specify a tokenizer for the synonyms and taxonomy, however the same documentation says that this function is deprecated for versions after the 6.0
I'm currently using the 7.5 so I assume I can't do this:
synonym" : {
"tokenizer" : "keyword",
"filter" : ["my_stop", "this is my synonym"]
}
EDIT:
Keywords, synonyms and taxonomy are in files so I'll paste my full index setting, synonyms, taxonomy and keywords as clarification.
Index:
{
"settings" : {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_stemmer": {
"type": "stemmer",
"language": "spanish"
},
"spanish_keywords":{
"type": "keyword_marker",
"keywords_path": "prueba/animales/keywords_animal.txt",
"ignore_case": true
},
"seres_vivos_syn":{
"type": "synonym",
"synonyms_path": "prueba/animales/sinonimos_animal.txt"
},
"tax_seres_vivos": {
"type": "synonym",
"synonyms_path": "prueba/animales/taxonomia_animal.txt"
}
},
"analyzer": {
"analyzer_español": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
spanish_keywords,
"spanish_stemmer"
]
},
"search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"tax_seres_vivos",
"spanish_stemmer",
"seres_vivos_syn"
]
}
}
}
},
"mappings" : {
"properties" : {
"text" : {
"type" : "text",
"analyzer": "analyzer_español",
"search_analyzer": "search_analyzer"}
}
}
}
Keywords:
"Felis Silvestris","F.Silvestris","F. Silvestris","Mola Mola","M.Mola","M. Mola","C.Lupus","C. Lupus","Canis Lupus"
Synonyms:
"Lobo,Feroz,Caperucita,
"Luna,Pez,Grande",
"Felis Silvestris,F.Silvestris,F. Silvestris,Felis_Silvestris",
"Mola_Mola,Mola Mola,M.Mola,M. Mola",
"Canis_Lupus,C.Lupus,C. Lupus,Canis Lupus"
Taxonomy:
"Animalia=>Animalia,Chordata,Vertebrata,Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Chordata=>Chordata,Vertebrata,Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Vertebrata=>Vertebrata,Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Mammalia=>Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Theria=>Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Placentalia=>Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Carnivora=>Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Feliformia=>Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Felidae=>Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Felinae=>Felinae,Felis,Felis_Silvestris,Gato",
"Felis=>Felis,Felis_Silvestris,Gato",
"Felis_Silvestris=>Felis_Silvestris,Gato,Felis Silvestris,F.Silvestris,F. Silvestris",
"Animalia=>Animalia,Chordata,Vertebrata,Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Chordata=>Chordata,Vertebrata,Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Vertebrata=>Vertebrata,Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Osteichthyes=>Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Actinopterygii=>Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Neopterygii=>Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Teleostei=>Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Acanthopterygii=>Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Tetraodontiformes=>Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Molidae=>Molidae,Mola_Mola,Luna,Mola Mola,M.Mola,M. Mola",
"Mola_Mola=>Mola_Mola,Luna",
"Animalia=>Animalia,Chordata,Vertebrata,Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Chordata=>Chordata,Vertebrata,Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Vertebrata=>Vertebrata,Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Mammalia=>Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Carnivora=>Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Canidae=>Canidae,Canis,Canis_Lupus,Lobo",
"Canis=>Canis,Canis_Lupus,Lobo",
"Canis_Lupus=>Canis_Lupus,Lobo,C.Lupus,C. Lupus,Canis Lupus"
EDIT2:
Using the following analyze:
GET /prueba_tax/_analyze
{
"analyzer" : "search_analyzer",
"text" : "canis lupus"
}
I get this:
{
"tokens" : [
{
"token" : "canis",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "canis_lupus",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "c.lupus",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "c",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "canis",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "lob",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "lupus",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "lupus",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "feroz",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "caperucit",
"start_offset" : 0,
"end_offset" : 5,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "lupus",
"start_offset" : 6,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
As you can see I have a "c"and a "canis" sepparated from "lupus" which appears several times, I belive this tokens should be combined and appear as "c. lupus" and "canis lupus".
If I analyze "c. lupus" the relation between all of them is there on the same fashion as before, but since I have that "c" or "canis" terms sepparated I'm not sure if this is the right way to do it.