Regexp and case insensitive


(Rotem Hermon) #1

As far as I know there is no option to provide a case insensitive option for regexp query or filter.

Is there a good reason for that?

This means that in order to provide case insensitive regex searches you need to have a multi field, once with the original term and once lower cased.

This is quite an overhead when having a lot of documents or fields. If the lower case regex search is something that doesn't happen often, it can be better to "pay" the runtime CPU overhead rather than always sustain the indexing overhead of keeping multiple indexes of the field.


(Adi Gabaie) #2

Hi Rotem,

Did you find any solution which is not multi-field or something like [Bb][Ll][Aa][Bb][Ll][Aa]?


(Nik Everett) #3

The trouble is that Lucene regexes don't have the option to support case insensitive searching. I'd cobbled together something mostly works in wikimedia-extra's source_regex filter. Its by no means perfect or efficient at all or even right in some cases. And it doesn't work like the regular regex search either so its not a standin for what you are doing. So I can't really suggest that you use it, its more like a case study in why its hard.


(system) #4