Hello everybody.
I'm trying to create a search query that would allow me to find only documents e.g. with the string like this:
"some kind of text 1000 other text 200 new text 3"
"some kind of text 2000 other text 320 new text 30"
My documents are just clear text.
I've tried phrase matching like this: "query": { "query_string": { "default_field": "textdata", "query": "\"some kind of text 1000 other text 200 new text 3\"" }
Works perfectly, but obviously matches only exact string #1.
If I try : "query": { "query_string": { "default_field": "textdata", "query": "some kind of text <1000-3000> other text <200-300> new text <0-100>" }
also seems to work, but in this case all info about phrase and proximity is lost
If I try to do "query": { "query_string": { "default_field": "textdata", "query": "\"some kind of text <1000-3000> other text <200-300> new text <0-100>\"" }
It doesn't find anything.
Is there a way to do it? Basically use regex for some tokens and enforce proximity rules on all tokens in query ?
Hi Adrien
Thank you. I'm not worried about performance too much at this point. My index is relatively small and I can afford longer time searches (obviously there is a limit, but that's something I'll worry about later).
I'll read up on span queries , but if you have an example that would fit my needs by any chance really appreciate it.
{"error":{"root_cause":[{"type":"parsing_exception","reason":"[span_multi] query does not support [regexp]","line":1,"col":385}],"type":"parsing_exception","reason":"[span_multi] query does not support [regexp]","line":1,"col":385},"status":400}
While doc says :
The span_multi query allows you to wrap a multi term query (one of wildcard, fuzzy, prefix, range or regexp query) as a span query, so it can be nested
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.