Search for documents containing words with and without dots (e.g. “ceo”, “c.e.o”, “c.e.o.”)

{
   "query":{
      "match":{
         "title": "ceo"
      }
   }
}

this query returns docs that only contain "ceo", "Ceo", "CEO". But the index has values like "c.e.o.", "C.E.O.", "c.e.o"
How to set up a query so that when searching for "ceo" it also returns docs with values that contain dots?

Hi @revencu

You have some options to resolve that.

I believe you can use synonyms. This article is a good introduction.

Another option is create a custom analyzer to replace ".".

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "pattern_replace",
      "pattern": "\\.",
      "replacement": ""
    }
  ],
  "text": "c.e.o"
}

Tokens

{
  "tokens": [
    {
      "token": "ceo",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

I created analyzer but it not help

{
    "analysis": {
      "analyzer": {
        "remove_dots": {
          "tokenizer": "standard",
          "type": "custom",
          "char_filter": ["my_char_filter"]
        }
      },
      "char_filter": {
        "my_char_filter": {"type": "pattern_replace", "pattern":"[.]+", "replacement":""}
      },
     "filter": ["lowercase", "asciifolding"],
    }
  }

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

I think this filter is misplaced.

Look my example that return all documents when search "ceo".

PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "remove_dots": {
          "tokenizer": "standard",
          "type": "custom",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "[.]+",
          "replacement": ""
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "remove_dots"
      }
    }
  }
}


POST my-index-000001/_bulk
{"index":{}}
{"description":"someone is C.E.O"}
{"index":{}}
{"description":"someone is c.e.o"}
{"index":{}}
{"description":"someone is c.e.o."}
{"index":{}}
{"description":"someone is ceo"}

POST my-index-000001/_search 
{
  "query": {
    "match": {
      "description": "ceo"
    }
  }
}
1 Like

I can't make a mapping because I have a huge index

 "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "remove_dots"
      }
    }
  }

So, I use this request but it not return values with dots

                    {
                         "match":{
                              "title": {
                                   "query":'ceo',
                                   "analyzer": "remove_dots"
                              }
                         },
                    },

OK. In that case I think you have to think otherwise.
Your document token looks something like this: "ceo", "Ceo", "CEO", "c.e.o.", "C.E.O.", "c.e.o".
As you cannot apply an analyzer at indexing time you have to add it at search time.
It would be better for you to add having the analyzer to "." in the search terms so you can get the match in "c.e.o.", "C.E.O.", "c.e.o".

Your query would have to be a should clausule because you will use the match for the term "ceo" and another match for the term "ceo" but with the analyzer that adds the ".".

Is any way to find in field that before removing all dots without script query?

By the scenario I don't know any other solution besides the ones presented, but maybe someone knows something new.

Thank you, but this solution not work on my side

The best option is to reindex your dataset. You can look at the reindex API for this.

1 Like

Yes, make sense, but I cannot do this for a huge index (700M records).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.