Elasticsearch/Kibana query_string with special characters

Hello,

i configured Elasticsearch cloud VM machine, i created index, simples types and mapping.

By using Kibana Dev-Tools console:

  • I am able to build index data by using POST and PUT requests.
  • Now i have to search text. I use query_string for manage different jokers. But when i search word that contains special characters (eg: @xxxx, !xxxx, xxxxé, *xxxx *), even if word exists in BD, i get empty hits.

I have tried to configure index settings with "Analyzer" but i did not get any solution and i have more gray areas.

Can somebody help me about that?

Here are requests examples:

1- all posts

GET test-search/serch_text/_search
{
}

"hits": {
    "total": 11,
    "max_score": 1,
    "hits": [
      {
        "_index": "test-search",
        "_type": "serch_text",
        "_id": "ZXd1DGIBBx5VGSRKr7QL",
        "_score": 1,
        "_source": {
          "text": "@funy"
        }
      },
      {
        "_index": "test-search",
        "_type": "serch_text",
        "_id": "aXdUE2IBBx5VGSRKGbTZ",
        "_score": 1,
        "_source": {
          "text": "@fun "
        }
      },
      {
        "_index": "test-search",
        "_type": "serch_text",
        "_id": "Ynd1DGIBBx5VGSRKlrQs",
        "_score": 1,
        "_source": {
          "text": "you have a & fun"
        }
      },
      {
        "_index": "test-search",
        "_type": "serch_text",
        "_id": "Znd1DGIBBx5VGSRKxLQW",
        "_score": 1,
        "_source": {
          "text": "%fun "
        }
      },
      {
        "_index": "test-search",
        "_type": "serch_text",
        "_id": "bHdVE2IBBx5VGSRKlbSD",
        "_score": 1,
        "_source": {
          "text": "I have a fun!"
        }
      }

2- not work
GET test-search/serch_text/_search
{
"query": {
"query_string": {
"query": "@fun"
}
}
}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

3- not work

GET test-search/serch_text/_search
{
  "query": {
         "query_string": {
          "query": "*I have a fun*"
        }
  }
}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

Thanks!

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

Ok dadoonet, i done it in </>.

I can't reproduce.

DELETE test 
PUT test/doc/1
{
  "text": "@fun"
}
GET test/_search
{
  "query": {
    "query_string": {
      "query": "@fun"
    }
  }
}

It gives:

{
  "took": 65,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "text": "@fun"
        }
      }
    ]
  }
}

Ok, weird.

I have created new index and 2 entries as you and it gives results as you.

Now please can you add this one:

PUT test/doc/3
{
  "text": "rougui@fun.com"
}

and test :
1-

    GET test/_search
    {
      "query": {
        "query_string": {
          "query": "@fun"
        }
      }
    }

2- I have to support an wrong special character entering in the query

        GET test/_search
            {
              "query": {
                "query_string": {
                  "query": "*@fun!*"
                }
              }
            }

Normally these 2 GET request must give results.

Thanks a lot!

other weird example:

i done successively these requests :

DELETE test

PUT test
{
}

POST test/_mapping/search
{
  "properties" : {
      "text" : {
        "type" : "text"
      }
    }
}

POST test/search/
{
  "text": "@fun"
}

POST test/search
{
  "text": "i have a & fun"
}

GET test/search/_search

GET test/search/_search
{
  "query": {
    "query_string": {
      "query": "@fun"
    }
  }
}

it gives :

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "search",
        "_id": "c3f6IGIBBx5VGSRKoLR6",
        "_score": 1,
        "_source": {
          "text": "@fun"
        }
      },
      {
        "_index": "test",
        "_type": "search",
        "_id": "dXf8IGIBBx5VGSRK-LSh",
        "_score": 1,
        "_source": {
          "text": "i have a & fun"
        }
      }
    ]
  }
}

Normally it must gives :

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "search",
        "_id": "c3f6IGIBBx5VGSRKoLR6",
        "_score": 1,
        "_source": {
          "text": "@fun"
        }
      }
    ]
  }
}

Have a look at analyze API to understand what is happening.

GET test/_analyze
{
  "text" : "@fun"
}

it gives:

{
  "tokens": [
    {
      "token": "fun",
      "start_offset": 1,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

it look likes it ignore @ character by tokenizing.

Yes. That's expected.

Hello,

But it is not expected behavior that i want. Below i give 2 examples that i have to manage.

eg:
1- when user enter @gmail in searchview, it must gives all emails that contains @gmail
2 - when user enter fun& in searchview, it must gives all entries that contains fun

But at this time it does not find words contains special characters.

You need to find the right analyzer which matches your use case. Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html

May be this tokenizer could help you: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html

Ok thanks.

I defined custom analyzer.

"analysis": {
      "analyzer": {
        "my_analyzer" : {
          "type" : "custom",
          "tokenizer": "standard",
          "filter" : [
            "email",
            "lowercase",
            "asciifolding"
            ],
          "char_filter" : [
            "specialCharactersFilter"
            ]
        }
      },
      
      "tokenizer": {
        "my_tokenizer": {
          "type": "uax_url_email"
        }
      },
      
      "filter" : {
        "email" : {
          "type" : "pattern_capture",
          "preserve_original" : true,
          "patterns" : ["(\\p{L}+)", "(\\d+)"]
        }
      },
      
      "char_filter": {
        "specialCharactersFilter": {
        "pattern": "[^a-zA-Z0-9À-ÖØ-öø-ÿ]+",
          "type": "pattern_replace",
          "replacement": ""
        }
      }
      
   }

with this one, its search pretty better.

I have again issue with @. I continue improve its.

If you have any idea, share me it.

Thanks

You can also define multiple analyzers and use multifields to index the same content with different analyzers.

Ok thanks !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.