Lowercase Token Filter with preserve_original


#1

Is there any way how to apply lowercase token filter and preserve original tokens too?

Our goal is to be able to search terms case-insensitive (by Span Term and lowercased text) together with case-sensitive search (by Span Term and correct text with appropriate upper case characters - useful for example for abbreviations, company names, etc.) in the same for example Span Near Query.

Thanks,
Zdenek


(Adrien Grand) #2

The lowercase filter does not allow to do that. Besides such a approach would raise issues with term statistics. There is no other way to do what you want right now, but maybe it would be in the future if you indexed two fields (with a multi-fields) and then used Lucene's FieldMaskingSpanQuery to be able to build a SpanNearQuery across two fields. For it to work, we would need to expose FieldMaskingSpanQuery in elasticsearch first.


(Mark Walkom) #3

You could leverage multifields though?


#4

Multifields cannot be used in our use-case because it is not possible to combine different multifields in one Span Query (that's what FieldMaskingSpanQuery will solve as Adrien wrote above).


#5

Thanks for answer


(system) #6