Improving ElasticSearch relevance with better normalization (beyond stemming)

tonyj · April 20, 2026, 4:11pm

Hi everyone,

In several Elasticsearch projects we’ve seen stemming introduce noise early in the analysis pipeline, especially in multilingual setups.

For example:

“organization” → “organ”
“news” → “new”
“united” → “unit”

These kinds of transformations can collapse unrelated terms into the same form, affecting matching quality and leading to less precise results.

In practice, this often results in more complex query logic (ngrams, fuzzy, etc.) or a heavier reliance on semantic search to compensate.

We’ve been exploring an alternative approach based on proper linguistic normalization (lemmatization + decompounding) before indexing, and testing how this impacts both lexical and semantic search performance.

Shared a short write-up with examples here:
https://www.linkedin.com/pulse/how-increase-search-relevance-elasticsearch-better-text-tony-chac%C3%B3n-arkic

Curious how others here are handling this:

sticking with stemming / custom analyzers?
moving fully to semantic search?
or improving normalization upstream?

Topic		Replies	Views
Stemming not performed Elasticsearch	4	1717	July 3, 2018
Searching for non English text Elasticsearch	3	3509	September 4, 2018
Custom normalization/filtering? Elasticsearch	1	383	July 6, 2017
Want to implement wordnet lemmatizer Elasticsearch	1	1041	July 6, 2017
How to Normalize across multiple search results Elasticsearch	2	624	April 26, 2020

Improving ElasticSearch relevance with better normalization (beyond stemming)

Related topics