Term contains a dot (.), nothing is returned

Hello. I'm doing a query on the "username" field. The results are correct in most cases. When the given term contains a dot (.), nothing is returned.

Ex: "firstname.lastname" = nothing is returned
"firstname lastname" = OK

query": {
    "bool": {
      "must": [
        {
          "match": {
            "username.autocomplete": {
              "query": "firstname.lastname",
              "operator": "and"
            }
          }
        }
      ],
      "filter": [
        {
          "term": {
            "delete_at": 0
          }
        }
      ]
    }
  }

How can I solve this problem? Thanks!

Hi @thales788

I don't know which parser you are using, but in your example if you index the term "firstname lastname" with the analyzer "standard" you will have two tokens.

When you search for the term "firstname.lastname", this same analyzer will generate only one token, so I believe that you are not successful in the search.

Perform the tests using the analyzer API to understand my assumption.

GET _analyze
{
  "text": ["firstname.lastname"],
  "analyzer": "standard"
}

@RabBit_BR, thank you for your help.

Input:

GET users/_analyze
{
  "text": ["firstname.lastname"],
  "analyzer": "standard"
}

Output:

{
  "tokens" : [
    {
      "token" : "firstname.lastname",
      "start_offset" : 0,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

In doc:

"_source" : {
          "username" : "firstname.lastname",
....

your query search in field username.autocomplete by your document has value firstname.lastname in field username.

This is my mapping:

username" : {
          "type" : "text",
          "fields" : {
            "autocomplete" : {
              "type" : "text",
              "analyzer" : "autocomplete",
              "search_analyzer" : "autocomplete_search"
            }
          },
          "analyzer" : "text_content"
        }

The query using username.autocomplete works when there is no dot.

John.Wick = nothing is returned
John Wick = success

I'm trying to identify a fix for this issue.

Thanks!

Nice. Show the analyzers: autocomplete, autocomplete_search, text_content.

 "analyzer" : {
            "text_content" : {
              "filter" : [
                "lowercase",
                "asciifolding"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            },
            "html_content" : {
              "filter" : [
                "lowercase",
                "asciifolding"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            },
            "autocomplete" : {
              "filter" : [
                "lowercase",
                "asciifolding"
              ],
              "type" : "custom",
              "tokenizer" : "autocomplete"
            },
            "autocomplete_search" : {
              "filter" : [
                "autocomplete_filter",
                "asciifolding",
                "lowercase"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            }
          },

What is the output of:

GET users/_analyze
{
  "text": ["John.Wick", "John Wick"],
  "analyzer": "autocomplete"
}

and

GET users/_analyze
{
  "text": ["John.Wick", "John Wick"],
  "analyzer": "autocomplete_search"
}

That should give you a clue.

Hello, @dadoonet. These are the results:

autocomplete:

{
  "tokens" : [
    {
      "token" : "joh",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "john",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "wic",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "wick",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "joh",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "word",
      "position" : 104
    },
    {
      "token" : "john",
      "start_offset" : 10,
      "end_offset" : 14,
      "type" : "word",
      "position" : 105
    },
    {
      "token" : "wic",
      "start_offset" : 15,
      "end_offset" : 18,
      "type" : "word",
      "position" : 106
    },
    {
      "token" : "wick",
      "start_offset" : 15,
      "end_offset" : 19,
      "type" : "word",
      "position" : 107
    }
  ]
}

autocomplete_search:

{
  "tokens" : [
    {
      "token" : "john.wick",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "john",
      "start_offset" : 10,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 101
    },
    {
      "token" : "wick",
      "start_offset" : 15,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 102
    }
  ]
}

More information:

using username instead of username.autocomplete it is possible to query with "john.wick", however the search "john wick" fails to return the result.

"match": {
            "username": {
              "query": "john.wick",
              "operator": "and"
            }
   }

Have a look at the tokens.

I think you have indexed John Wick.

At index time (autocomplete), you indexed: joh, john, wic, wick.
At search time (autocomplete_search), you are searching for:

  • For John.Wick: john.wick
  • For John Wick: john, wick

That's the reason it does not match for John.Wick.

You need to change your analyzer so it will produce the right tokens when searching with a name which has a dot.

I have no idea of what your "tokenizer" : "autocomplete" is. But I think you should look at it and may be use something similar instead of "tokenizer" : "standard".

Note that autocomplete suggests that you want to search for sub terms...
Here you have the full terms...

Why about combining multiple searches using a bool / should array of match queries?

If you can't make it work, please provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.