Sorting with case insensitive

nataly · July 8, 2025, 10:02am

hello,

currently i have an index - lets call it index1, that im doing a sorting on a specific property there, lets call it prop1, that is a keyword. for example if the values are ABCabc, it will sort it with ABCabc, but i want AaBbCc.

i read that the option that i have is adding a normalizer, but for that i need to do reindexing. the problem i have with that solution is that i don't have enough space on the disk to create a new index and copy it from the old to the new one.
another option that i saw was to do update by query, but the number of the documents that i have is really big, and it will take a lot of time to do it (more than 100M documents).
adding a script that perform another sort on the ascii sorted data is also not an option because it takes more time which can slow the performance.

is there any other option that can help me to fix this issue?

thanks

dadoonet · July 8, 2025, 11:08am

Welcome!

That's correct.

This will also consume a LOT of disk space. So you might be in a similar position as for your initial proposal.

Correct again.

I don't think there's a fast solution where you don't reindex your data.

nataly · July 8, 2025, 11:30am

thank you for you answer!
why it will take the same amount of space?

when i do the reindexing i create a new index and copy there the new data with the new settings and mapping, so there will be a point in time that i will need X2 space of that index.

and in the update by query i go over each document and reindexing it, so why here it will take also a lot of space?

thank you!

dadoonet · July 8, 2025, 12:23pm

why it will take the same amount of space?

That's the way how Lucene works behind the scene. We create new segments whenever you update/delete a document. The new segments contain the new values. But the old segments are still on disk until segments are merged, which is eventually happening at some point.

nataly · July 8, 2025, 1:00pm

i see, thank you.
so is there a way i can do it so that after each document i will delete the old segments? so during that operation i will use as small amount of extra memory as possible?

dadoonet · July 8, 2025, 1:54pm

You can call _force_merge API but don’t do that for every document.

What is the current size of the index? In number of documents and size on disk?

nataly · July 13, 2025, 12:37pm

thank you,
i have a follow up question about it before the force merge issue.
is it possible to change the mapping on a current property without adding a new subfield?
for example a added a normalaizer to the settings.
than in the mapping, instead of:
"mappings": {
"properties": {
"name1": {
"type": "text"
"fields": {
"subfield1": {
"type": "keyword"
}
}
},

i want to change the mapping of subfield1 and than do the update_by_query:
"mappings": {
"properties": {
"name1": {
"type": "text"
"fields": {
"subfield1": {
"type": "keyword",
"normalaizer": "normalaizer1"
}
}
},

i saw that i need to add a subfield, but it there an option to just change subfield1 like this without adding subfield2 for example? because adding for each document a subfield will take more space, so im trying to take as small space as possible in my changes.

Topic		Replies	Views
Best way to sort by a rapidly changing field? Elasticsearch	6	1444	July 6, 2017
Case Insensitive search using match query for keyword Elasticsearch	6	10664	June 8, 2017
How to sort with case insensitive in elasticsearch Elasticsearch	1	599	November 5, 2020
Data indexing process taking a lot of time Elasticsearch	1	271	July 6, 2017
`null` is returned for sort instead of field value Elasticsearch	1	242	February 20, 2024

Sorting with case insensitive

Related topics