Support for Anchoring in Elasticsearch Regex


(vaidik) #1

Hi Folks,

I see that Elasticsearch supports Regex. But that is limited to Lucene's
Regex Engine which does not support anchoring i.e. the entire string will
always be anchored. This works as long as you have fixed regular
expressions to run, but in cases where the regex query is taken from the
user, this becomes very limiting.

Is there an alternative regex engine for Elasticsearch that at least
supports $ and ^ for anchoring? Quick Google and Github search did not get
me anything. If not, then is anybody doing something similar or have a work
around? One possible solution that I can think of is converting user's
entered regex to Lucene compatible regex. But that gets really complex to
do correctly with all the grouping and alternation in regex.

I don't want the entire Perl regex kind of support. Just the anchoring bit
is important. Has anybody tried to solve this problem before?

Thanks,
Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5%3DvoJ1B7K9CN3M0O9hvLTyV0cJVq3qiy%2BJiy0crTfPRjg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Lee Gee) #2

Lucene and Elastic Search both anchor regexp by default.

"Lucene’s patterns are always anchored. The pattern provided must match the
entire string. "


http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax

On Wednesday, December 18, 2013 7:19:48 AM UTC, Vaidik Kapoor wrote:

Hi Folks,

I see that Elasticsearch supports Regex. But that is limited to Lucene's
Regex Engine which does not support anchoring i.e. the entire string will
always be anchored. This works as long as you have fixed regular
expressions to run, but in cases where the regex query is taken from the
user, this becomes very limiting.

Is there an alternative regex engine for Elasticsearch that at least
supports $ and ^ for anchoring? Quick Google and Github search did not get
me anything. If not, then is anybody doing something similar or have a work
around? One possible solution that I can think of is converting user's
entered regex to Lucene compatible regex. But that gets really complex to
do correctly with all the grouping and alternation in regex.

I don't want the entire Perl regex kind of support. Just the anchoring bit
is important. Has anybody tried to solve this problem before?

Thanks,
Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27a0c79c-94bc-4878-b355-dd4895bc4135%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3