Lowercase Token Filter with preserve_original

zdeseb · July 21, 2015, 3:17pm

Is there any way how to apply lowercase token filter and preserve original tokens too?

Our goal is to be able to search terms case-insensitive (by Span Term and lowercased text) together with case-sensitive search (by Span Term and correct text with appropriate upper case characters - useful for example for abbreviations, company names, etc.) in the same for example Span Near Query.

Thanks,
Zdenek

jpountz · July 21, 2015, 5:33pm

The lowercase filter does not allow to do that. Besides such a approach would raise issues with term statistics. There is no other way to do what you want right now, but maybe it would be in the future if you indexed two fields (with a multi-fields) and then used Lucene's FieldMaskingSpanQuery to be able to build a SpanNearQuery across two fields. For it to work, we would need to expose FieldMaskingSpanQuery in elasticsearch first.

warkolm · July 23, 2015, 6:59am

You could leverage multifields though?

zdeseb · July 23, 2015, 10:44am

Multifields cannot be used in our use-case because it is not possible to combine different multifields in one Span Query (that's what FieldMaskingSpanQuery will solve as Adrien wrote above).

zdeseb · July 23, 2015, 10:45am

Thanks for answer