How to handle special characters in span_near query?

Hi Team,
I'm having very huge boolean query which contains NEAR operator with more than 80k phrase limit, so I'm using span_near to handle NEAR operator in my boolean query, but I can see that if I pass keyword with special characters, it simply getting ignored by elasticsearch, please can anyone help me out on how to handle special characters in span_near query?
Example:


{
 "span_near": {
              "clauses": [
                  {
                      "span_or": {
                          "clauses": [
                              {
                                  "span_term": {
                                      "body": "covid*"
                                  }
                              }
                          ]
                      }
                  },
                  {
                      "span_or": {
                          "clauses": [
                              {
                                  "span_term": {
                                      "body": "9-8-8"
                                  }
                              }
                          ]
                      }
                  }
              ],
              "slop": "10",
              "in_order": false
          }
      }

Hey there, special characters are getting stripped during tokenization. You can customize this by specifying the tokenizer you want to use, creating a custom analyzer and reindexing your data. Hope that helps!

Hey Kathleen,
Thanks for the quick reply, really appreciated!!
I already have some custom analyzers indexed on my data, they are working fine with normal query i.e. if I specify them in default fields inside query string then its working and returning matching results.
Please can you share syntax or any other way to implement it on span_near query?
Thanks in Advance!!

Sure, here's a quick demo script using the whitespace analyzer - this will not do anything but tokenize on whitespace, but will return a matching document you want using the query you provided. You can play with this script using additional data and different analyzers to see what works for your data and use case.

PUT my-span-test
{
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}

PUT my-span-test/_doc/1
{
  "body": "covid* 9-8-8"
}

POST my-span-test/_search
{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "covid*"
                }
              }
            ]
          }
        },
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "9-8-8"
                }
              }
            ]
          }
        }
      ],
      "slop": "10",
      "in_order": false
    }
  }
}

Thanks for the demo!!

Custom analyzer is only working if I'm specifying it
For example: I've created a custom analyzer for special characters i.e. cs_special_characters

It'll only work if I specify it:

"query": {
    "query_string": {
        "default_field": "body.cs_special_characters",
        "query": "(covid* AND 9-8-8)",
        "analyzer":"whitespace"
    }
}

But here in span_near we are not specifying it anywhere and if I give body.cs_special_characters inside span_term instead of body then it is throwing error

{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "covid*"
                }
              }
            ]
          }
        },
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "9-8-8"
                }
              }
            ]
          }
        }
      ],
      "slop": "10",
      "in_order": false
    }
  }
}

Is there any syntax or way by which we can specify to use particular analyzer inside span_near query?
Or if you have any other alternative to handle NEAR other than span_near?

You can try an index analyzer and reindexing.

Alternately you could see if a match phrase query with a very high slop would work for your needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.