Keywords not behaving as I need

Hello there.

I'm trying to create a searcher that uses synonyms and taxonomies mixed with some keywords due to composite terms

These are the 2 analyzers I am trying to create :

// "analysis": {

        "filter": { 

            "spanish_stop": { 

                "type":       "stop", 

                "stopwords":  "_spanish_"  

            }, 

            "spanish_stemmer": { 

                "type":       "stemmer", 

                "language":   "spanish" 

            }, 
			
			"spanish_keywords":{
			
				"type": "keyword_marker",
				"keywords_path": "prueba/animales/keywords_animal.txt",
				"ignore_case": true
			},
			
            "seres_vivos_syn":{
              "type": "synonym",
              "synonyms_path": "prueba/animales/sinonimos_animal.txt"
            },

            "tax_seres_vivos": { 

                "type": "synonym", 

                "synonyms_path": "prueba/animales/taxonomia_animal.txt"

            } 

        }, 

        "analyzer": { 

            "analyzer_español": {  

                "type": "custom", 

                "tokenizer": "standard", 

                "filter": [ 
                    "lowercase",
                    "spanish_stop", 
					spanish_keywords,
                    "spanish_stemmer" 
                ] 

            }, 

            "search_analyzer": {  

                "type": "custom", 

                "tokenizer": "standard", 

                "filter": [ 
                    "lowercase", 
                    "spanish_stop",
					"spanish_keywords",
                    "tax_seres_vivos", 
                    "spanish_stemmer",
                    "seres_vivos_syn"
                ] 

            } 

        } 

    } 

Problem comes when trying to link terms such as for example "Canis Lupus", on my taxonomy it appears as "Canis_Lupus" and it's linked to "Canis Lupus", "C.Lupus" and "C. Lupus" I have all these words included on the keywords, however since I'm using the standard tokenizer my keywords are being affect as well by the same tokenizer.
If I search for example "End of the world" and I have it as a keyword I want it to remain full and not tokenize and i want it to be linked by taxonomy and synonyms as a full term that can be found on my tokenized text.

I don't want to use the ability to separate the texts I index by tokens while adding the ability to keep the keywords as full words, no tokenizer, no stopwords removed, no stemming.

This is my mapping:

// "mappings" : {

    "properties" : { 

        "text" : {  

            "type" : "text",  

            "analyzer": "analyzer_español", 

            "search_analyzer": "search_analyzer"} 

    } 

} 

Should I use the tokenizer standard for indexing while using the keyword tokenizer for searching? Is there anyway to indicate the keywords to not use the tokenizer, stopwords and stemmer? or to tell the tokenizer not to work on the keywords?

Thank you in advance

Have a look if the synonym token filter suits your needs.

I have seen the tokenizer field inside the synonym field, to specify a tokenizer for the synonyms and taxonomy, however the same documentation says that this function is deprecated for versions after the 6.0

I'm currently using the 7.5 so I assume I can't do this:

 synonym" : {
                        "tokenizer" : "keyword",
                        "filter" : ["my_stop", "this is my synonym"]
                    }

EDIT:

Keywords, synonyms and taxonomy are in files so I'll paste my full index setting, synonyms, taxonomy and keywords as clarification.

Index:

{ 

    "settings" : { 

        "analysis": { 

            "filter": { 

                "spanish_stop": { 

                    "type":       "stop", 

                    "stopwords":  "_spanish_"  

                }, 

                "spanish_stemmer": { 

                    "type":       "stemmer", 

                    "language":   "spanish" 

                }, 
				
				"spanish_keywords":{
				
					"type": "keyword_marker",
					"keywords_path": "prueba/animales/keywords_animal.txt",
					"ignore_case": true
				},
				
                "seres_vivos_syn":{
                  "type": "synonym",
                  "synonyms_path": "prueba/animales/sinonimos_animal.txt"
                },

                "tax_seres_vivos": { 

                    "type": "synonym", 

                    "synonyms_path": "prueba/animales/taxonomia_animal.txt"

                } 

            }, 

            "analyzer": { 

                "analyzer_español": {  

                    "type": "custom", 

                    "tokenizer": "standard", 

                    "filter": [ 
                        "lowercase",
                        "spanish_stop", 
						spanish_keywords,
                        "spanish_stemmer" 
                    ] 

                }, 

                "search_analyzer": {  

                    "type": "custom", 

                    "tokenizer": "standard", 

                    "filter": [ 
                        "lowercase", 
                        "spanish_stop",
						"spanish_keywords",
                        "tax_seres_vivos", 
                        "spanish_stemmer",
                        "seres_vivos_syn"
                    ] 

                } 

            } 

        } 

    }, 

    "mappings" : { 

        "properties" : { 

            "text" : {  

                "type" : "text",  

                "analyzer": "analyzer_español", 

                "search_analyzer": "search_analyzer"} 

        } 

    } 

} 

Keywords:

"Felis Silvestris","F.Silvestris","F. Silvestris","Mola Mola","M.Mola","M. Mola","C.Lupus","C. Lupus","Canis Lupus"

Synonyms:

"Lobo,Feroz,Caperucita,
"Luna,Pez,Grande",
"Felis Silvestris,F.Silvestris,F. Silvestris,Felis_Silvestris",
"Mola_Mola,Mola Mola,M.Mola,M. Mola",
"Canis_Lupus,C.Lupus,C. Lupus,Canis Lupus"

Taxonomy:

"Animalia=>Animalia,Chordata,Vertebrata,Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Chordata=>Chordata,Vertebrata,Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Vertebrata=>Vertebrata,Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Mammalia=>Mammalia,Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Theria=>Theria,Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Placentalia=>Placentalia,Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Carnivora=>Carnivora,Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Feliformia=>Feliformia,Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Felidae=>Felidae,Felinae,Felis,Felis_Silvestris,Gato",
"Felinae=>Felinae,Felis,Felis_Silvestris,Gato",
"Felis=>Felis,Felis_Silvestris,Gato",
"Felis_Silvestris=>Felis_Silvestris,Gato,Felis Silvestris,F.Silvestris,F. Silvestris",
"Animalia=>Animalia,Chordata,Vertebrata,Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Chordata=>Chordata,Vertebrata,Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Vertebrata=>Vertebrata,Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Osteichthyes=>Osteichthyes,Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Actinopterygii=>Actinopterygii,Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Neopterygii=>Neopterygii,Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Teleostei=>Teleostei,Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Acanthopterygii=>Acanthopterygii,Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Tetraodontiformes=>Tetraodontiformes,Molidae,Mola_Mola,Luna",
"Molidae=>Molidae,Mola_Mola,Luna,Mola Mola,M.Mola,M. Mola",
"Mola_Mola=>Mola_Mola,Luna",
"Animalia=>Animalia,Chordata,Vertebrata,Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Chordata=>Chordata,Vertebrata,Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Vertebrata=>Vertebrata,Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Mammalia=>Mammalia,Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Carnivora=>Carnivora,Canidae,Canis,Canis_Lupus,Lobo",
"Canidae=>Canidae,Canis,Canis_Lupus,Lobo",
"Canis=>Canis,Canis_Lupus,Lobo",
"Canis_Lupus=>Canis_Lupus,Lobo,C.Lupus,C. Lupus,Canis Lupus"

EDIT2:

Using the following analyze:

GET /prueba_tax/_analyze
{
  "analyzer" : "search_analyzer",
  "text" : "canis lupus"
}

I get this:

{
  "tokens" : [
    {
      "token" : "canis",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "canis_lupus",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "c.lupus",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "canis",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "lob",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "lupus",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "lupus",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "feroz",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "caperucit",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "lupus",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

As you can see I have a "c"and a "canis" sepparated from "lupus" which appears several times, I belive this tokens should be combined and appear as "c. lupus" and "canis lupus".

If I analyze "c. lupus" the relation between all of them is there on the same fashion as before, but since I have that "c" or "canis" terms sepparated I'm not sure if this is the right way to do it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.