I have an analyzed filed containing various phrases. I'm running query_string queries against that field and would like to find records where the string begins with a certain pattern. I was attempting to do this via regular expression.
/^BEGINNING.*/
but the lucene engine does not support anchors, so this does not work. Elasticsearch documention about regular expressions indicate patterns are already anchored, but when I do,
/BEGINNING.*/
I get results where BEGINNING is anywhere in the phrase.
In the mapping definition for the field you are searching against, you would either want to set it to not_analyzed or if you need some analysis like lower-casing, you could add something like this to your index settings:
Humm... I'm currently using the snowball analyzer. If I want to continue this, but also have the ability to use anchored regexp, then would I need to also index the field as not_analyzed, and when someone inputs a reg expression with "/ /" run the query against the not_analyzed version?
Another trick that might work is to insert a marker string at the beginning of each phrase at index time and include that marker at the beginning of your phrase queries. Given that character filters act before the text is tokenised, you might be able to use a pattern replace character filter to insert the marker string for you. Ie, replace /^(.*)/ with "aaa $1". https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html
I haven't tried it, and so can't give any guarantees
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.