The character " carries special semantic meaning in the Lucene regexp engine that means something like "treat everything until the next " as a literal character, not as a pattern expression", or if already in a literal expression, means "this is the end of the literal expression" (docs):
Let's start with the expression in your JSON query:
".*\"\"\".*"
After being parsed into the string it represents, we get:
.*""".*
This is given to the Lucene regexp engine, which parses it to mean:
.* any sequence, followed by
" a literal sequence (everything until the next ")
" literal sequence ends
" a literal sequence (everything until the next ")
. a literal dot
* a literal asterisk
UNEXPECTED END: no matching close ".
You may be able to escape the literal double-quote inside the literal-sequence by prefixing it with a backslash (e.g., .*"\"".*, which itself would get escaped again when being converted to JSON to be ".*\"\\\"\".*"), but the escaping of double-quotes inside double-quote sequence isn't clearly documented, so that may or may not work:
My guess is that you're taking arbitrary input and simply concatenating (.*" + input + ".*). You may be able to avoid double-quote entirely by concatenating (.* + quote(input) + .*), where quote is some function that escapes all characters with special meaning by prefixing them with a backslash (\).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.