Analyzer to support slash "/" in Query String search

Hi,
I have an index that store an "url" field that contains "/" (forward slash) on which I execute only query_string searches.

Unfortunately the slash "/" seems to have side effects on query_string search because of the analyzer on the field. The same when you use the wildcard character "*".

I don't get any errors by the query_string search, just a wrong or empty result.

So my question is: What type of mapping I should use for a field to support the slash "/" as a normal character for a field used only on Query String searches?

Something similar was discussed on this post:

But no solutions.

Currently I have the configuration below for the index:

Field Mapping

{
    "name": {
        "type": "text"
    },
    "hash": {
        "type": "keyword"
    },
    "duration": {
        "type": "float"
    }
}

Settings

{
    "analysis": {
        "char_filter": {
            "replace": {
                "type": "mapping",
                "mappings": [
                    "&=> and "
                ]
            }
        },
        "filter": {
            "word_delimiter": {
                "type": "word_delimiter",
                "split_on_numerics": false,
                "split_on_case_change": true,
                "generate_word_parts": true,
                "generate_number_parts": true,
                "catenate_all": true,
                "preserve_original": true,
                "catenate_numbers": true
            }
        },
        "analyzer": {
            "default": {
                "type": "custom",
                "char_filter": [
                    "html_strip",
                    "replace"
                ],
                "tokenizer": "whitespace",
                "filter": [
                    "lowercase",
                    "word_delimiter"
                ]
            }
        }
    }
}

What analyzer configuration should be used to make Query String works also with "/" and "*"?

Hi,
I have a a question, are you trying to strip the html from the urls but keep the '/' characters. Or keep the origional mappings of urls to be html stripped but on search, allow the user to search '/'?

URLs do not contains html, just the path of the endpoint with "/".

I'm not very aware of what the analyzer is doing since I used a suggested configuration from other developers.

I tried to learn more about analyzers on the Elastic documentation, but it seems quite complicated. Maybe I don't need all this index settings but I don't know the possible side effects so I tried to ask here.

Maybe some videos, articles, or other resources could be helpful.

oh okay I see,
You can try using the keyword tokenizer. This will treat the url as an entire word and keep all '/', ' ' and '*' special characters. You can find an example below or in the docs:

# Test Keyword Tokenizer
POST _analyze
{
  "tokenizer": "keyword",
  "text": "New York*/"
}
# Example setup
 PUT /url_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "url_analyzer": {
          "type": "custom",
          "tokenizer": "keyword", 
          "filter": [
            "lowercase" 
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "url": {
        "type": "text",
        "analyzer": "url_analyzer"
      }
    }
  }
}

Thank you for the feedback. I'm going to try.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.