Keep names in native cases

comdiv · May 8, 2015, 8:41am

Have following task.
Index we provide must match following requirements:

It works with russian morphology (so we install elastic_search_morphology plugin)
It must to supply special analyzer that determines and wellforming personal names
To determine names we have some euristics based on case, abbrevations and punctuations
At the end point we want to see close to following:
GET /index/_analyze?analyzer=coolanalyzer
{"text":"J. Fire light the fire in Mary. Trust me."}
tokens -> 1: "J. Fire" type=name, 2: "fire" 3:"light" 4:"fire", 5:"Mary" type=name, 6: "trust"

So we want to have both usual tokens and name tokens in analyzer result.

Problems - usual tokenizer kill punctuation, usual morphology requires lowercase as preceding filter.

Any suggestions how to configure it? Because for now we just extract names before putting in elastic in
other field. It allow as to provide name-based search. But not allows highliting and simple search with one full-text query with same analyzer applyed.

Can i use 2 independent chains of tokenizer+analyzer that further merges their results?

jprante · May 8, 2015, 9:05am

Analyzers are just for producing tokens for Lucene indexing. They do not perform natural language processing (NLP) or name entity recognition (NER).

If you want to experiment with NLP or NER, additional software is required, provided by plugins. See for example

Topic		Replies	Views
The term(s) filter and the standard analyzer Elasticsearch	5	877	July 5, 2017
How can I put together a case-insensitive analyzer for tokens? Elasticsearch	4	4151	July 6, 2017
How to match a field and a sentence with capital letter and special character? Elasticsearch	4	726	May 31, 2019
Nested document & require ignore case Elasticsearch	4	356	July 6, 2017
How to index documents which contain pascal-case strings and do some stemming associations? Elasticsearch	4	752	January 2, 2020

Keep names in native cases

Related topics