Keep names in native cases


(FAGIM SADYKOV) #1

Have following task.
Index we provide must match following requirements:

  1. It works with russian morphology (so we install elastic_search_morphology plugin)
  2. It must to supply special analyzer that determines and wellforming personal names
  3. To determine names we have some euristics based on case, abbrevations and punctuations
  4. At the end point we want to see close to following:
    GET /index/_analyze?analyzer=coolanalyzer
    {"text":"J. Fire light the fire in Mary. Trust me."}
    tokens -> 1: "J. Fire" type=name, 2: "fire" 3:"light" 4:"fire", 5:"Mary" type=name, 6: "trust"

So we want to have both usual tokens and name tokens in analyzer result.

Problems - usual tokenizer kill punctuation, usual morphology requires lowercase as preceding filter.

Any suggestions how to configure it? Because for now we just extract names before putting in elastic in
other field. It allow as to provide name-based search. But not allows highliting and simple search with one full-text query with same analyzer applyed.

Can i use 2 independent chains of tokenizer+analyzer that further merges their results?


(Jörg Prante) #2

Analyzers are just for producing tokens for Lucene indexing. They do not perform natural language processing (NLP) or name entity recognition (NER).

If you want to experiment with NLP or NER, additional software is required, provided by plugins. See for example



(system) #3