Regex + phrase search

Hello everybody.
I'm trying to create a search query that would allow me to find only documents e.g. with the string like this:
"some kind of text 1000 other text 200 new text 3"
"some kind of text 2000 other text 320 new text 30"

My documents are just clear text.

I've tried phrase matching like this:
"query": { "query_string": { "default_field": "textdata", "query": "\"some kind of text 1000 other text 200 new text 3\"" }
Works perfectly, but obviously matches only exact string #1.
If I try :
"query": { "query_string": { "default_field": "textdata", "query": "some kind of text <1000-3000> other text <200-300> new text <0-100>" }
also seems to work, but in this case all info about phrase and proximity is lost

If I try to do
"query": { "query_string": { "default_field": "textdata", "query": "\"some kind of text <1000-3000> other text <200-300> new text <0-100>\"" }
It doesn't find anything.

Is there a way to do it? Basically use regex for some tokens and enforce proximity rules on all tokens in query ?

Thank you!Preformatted text

There is no way to run such a query. Some people might argue you could use span queries, but this would perform terribly.

Hi Adrien
Thank you. I'm not worried about performance too much at this point. My index is relatively small and I can afford longer time searches (obviously there is a limit, but that's something I'll worry about later).
I'll read up on span queries , but if you have an example that would fit my needs by any chance really appreciate it.

Thank you.

Was trying span queries and what seems to be an obvious approach, tried query like this:

`  "query": {

"span_near": {

  "clauses": [

    {

      "span_multi": {

      "regexp":{

          "textdata": "[0-9]{6}"

                    }

      }

    }, 

    {

      "span_term": {

          "textdata": "word1"

      }

    }, 

    {

      "span_term": {

          "textdata": "word2"

      }

    }, 

    {

      "span_term": {

          "textdata": "word3"

      }

    }, 

    {

      "span_term": {

          "textdata": "word4"

      }

    }, 

    {

      "span_term": {

          "textdata": "word5"

      }

    }

  ], 

  "in_order": true, 

  "slop": 1

}

}`

I get error like this:

{"error":{"root_cause":[{"type":"parsing_exception","reason":"[span_multi] query does not support [regexp]","line":1,"col":385}],"type":"parsing_exception","reason":"[span_multi] query does not support [regexp]","line":1,"col":385},"status":400}

While doc says :
The span_multi query allows you to wrap a multi term query (one of wildcard, fuzzy, prefix, range or regexp query) as a span query, so it can be nested

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-span-multi-term-query.html

Ok found it.
That multi span needs to be replaced with:

` {

      "span_multi": {

      "match":{"regexp":{

          "textdata": "[0-9]{6,}"

                    }

      }}

    },`

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.