Is a value stored twice when using a tolowercase normalizer on keywords?

DanglingNode · October 13, 2020, 10:01am

Hi all,
I'm using a 'tolowercase' normalizer upon index (mapping) time to be able to find documents with value 'John Doe' when searching for 'john doe'.

Upon index creation:

       client.Indices.Create(
                    "some_index", 
                    c => c
                        .Settings(s => s
                            .Analysis(a => a
                                .Normalizers(n => n
                                    .Custom("lowercase", cn => cn.Filters("lowercase")))))

                        .Map<Person>(m => m.AutoMap())
                    );

The person object:

internal class Person {
    [Keyword(Normalizer = "lowercase")]
    public string Name {get; set;}
}

Does this mean more storage is used for the fields with this normalizer defined? So for documents having 'John Doe' as name. Will this value be stored as 'John Doe' and 'john doe'? Or does the normalization happen upon search computation time?

If the value is stored twice, are there other (better) solutions to avoid having to store more tokens?

Mark_Harwood · October 13, 2020, 10:43am

No, normalising typically reduces disk space because the search index doesn't contain all the case variations that might exist in different documents. Think of it like the word index at the back of a book - it would be bigger if it listed both page 10's Aardvark and page 33's aardvark as different entries

system · November 10, 2020, 10:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Case Insensitive search using match query for keyword Elasticsearch	6	10664	June 8, 2017
One word fields (keyword vs text types) Elasticsearch	10	3914	December 6, 2020
Term request with normalized keyword behave strangely Elasticsearch	1	655	July 26, 2017
Lowercase normalizer not working Elasticsearch	6	1320	September 15, 2020
Keyword type to lowercase Elasticsearch	2	990	April 20, 2017

Is a value stored twice when using a tolowercase normalizer on keywords?

Related topics