How to handle special characters in span_near query?

shubham_gupta4 · July 15, 2024, 6:21am

Hi Team,
I'm having very huge boolean query which contains NEAR operator with more than 80k phrase limit, so I'm using span_near to handle NEAR operator in my boolean query, but I can see that if I pass keyword with special characters, it simply getting ignored by elasticsearch, please can anyone help me out on how to handle special characters in span_near query?
Example:


{
 "span_near": {
              "clauses": [
                  {
                      "span_or": {
                          "clauses": [
                              {
                                  "span_term": {
                                      "body": "covid*"
                                  }
                              }
                          ]
                      }
                  },
                  {
                      "span_or": {
                          "clauses": [
                              {
                                  "span_term": {
                                      "body": "9-8-8"
                                  }
                              }
                          ]
                      }
                  }
              ],
              "slop": "10",
              "in_order": false
          }
      }

Kathleen_DeRusso · July 15, 2024, 12:10pm

Hey there, special characters are getting stripped during tokenization. You can customize this by specifying the tokenizer you want to use, creating a custom analyzer and reindexing your data. Hope that helps!

shubham_gupta4 · July 15, 2024, 12:29pm

Hey Kathleen,
Thanks for the quick reply, really appreciated!!
I already have some custom analyzers indexed on my data, they are working fine with normal query i.e. if I specify them in default fields inside query string then its working and returning matching results.
Please can you share syntax or any other way to implement it on span_near query?
Thanks in Advance!!

Kathleen_DeRusso · July 15, 2024, 2:53pm

Sure, here's a quick demo script using the whitespace analyzer - this will not do anything but tokenize on whitespace, but will return a matching document you want using the query you provided. You can play with this script using additional data and different analyzers to see what works for your data and use case.

PUT my-span-test
{
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}

PUT my-span-test/_doc/1
{
  "body": "covid* 9-8-8"
}

POST my-span-test/_search
{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "covid*"
                }
              }
            ]
          }
        },
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "9-8-8"
                }
              }
            ]
          }
        }
      ],
      "slop": "10",
      "in_order": false
    }
  }
}

shubham_gupta4 · July 16, 2024, 7:46am

Thanks for the demo!!

Custom analyzer is only working if I'm specifying it
For example: I've created a custom analyzer for special characters i.e. cs_special_characters

It'll only work if I specify it:

"query": {
    "query_string": {
        "default_field": "body.cs_special_characters",
        "query": "(covid* AND 9-8-8)",
        "analyzer":"whitespace"
    }
}

But here in span_near we are not specifying it anywhere and if I give body.cs_special_characters inside span_term instead of body then it is throwing error

{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "covid*"
                }
              }
            ]
          }
        },
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "body": "9-8-8"
                }
              }
            ]
          }
        }
      ],
      "slop": "10",
      "in_order": false
    }
  }
}

Is there any syntax or way by which we can specify to use particular analyzer inside span_near query?
Or if you have any other alternative to handle NEAR other than span_near?

Kathleen_DeRusso · July 16, 2024, 12:16pm

You can try an index analyzer and reindexing.

Alternately you could see if a match phrase query with a very high slop would work for your needs.

system · August 13, 2024, 12:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Searching word with special characters Elasticsearch	7	1823	November 4, 2020
Not getting proper result while searching special characters in query string Elasticsearch	1	269	August 13, 2021
Search Special char support Elasticsearch	3	139	February 27, 2024
Query with regular expression special characters Elasticsearch	3	3277	October 17, 2019
Phrases with special characters Elasticsearch	1	1386	July 6, 2017

How to handle special characters in span_near query?

Related topics