Search for documents containing words with and without dots (e.g. “ceo”, “c.e.o”, “c.e.o.”)

revencu · November 16, 2022, 8:54am

{
   "query":{
      "match":{
         "title": "ceo"
      }
   }
}

this query returns docs that only contain "ceo", "Ceo", "CEO". But the index has values like "c.e.o.", "C.E.O.", "c.e.o"
How to set up a query so that when searching for "ceo" it also returns docs with values that contain dots?

RabBit_BR · November 16, 2022, 9:03am

Hi @revencu

You have some options to resolve that.

I believe you can use synonyms. This article is a good introduction.

Another option is create a custom analyzer to replace ".".

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "pattern_replace",
      "pattern": "\\.",
      "replacement": ""
    }
  ],
  "text": "c.e.o"
}

Tokens

{
  "tokens": [
    {
      "token": "ceo",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

revencu · November 16, 2022, 9:06am

I created analyzer but it not help

{
    "analysis": {
      "analyzer": {
        "remove_dots": {
          "tokenizer": "standard",
          "type": "custom",
          "char_filter": ["my_char_filter"]
        }
      },
      "char_filter": {
        "my_char_filter": {"type": "pattern_replace", "pattern":"[.]+", "replacement":""}
      },
     "filter": ["lowercase", "asciifolding"],
    }
  }

dadoonet · November 16, 2022, 9:19am

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

RabBit_BR · November 16, 2022, 9:21am

I think this filter is misplaced.

Look my example that return all documents when search "ceo".

PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "remove_dots": {
          "tokenizer": "standard",
          "type": "custom",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "[.]+",
          "replacement": ""
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "remove_dots"
      }
    }
  }
}


POST my-index-000001/_bulk
{"index":{}}
{"description":"someone is C.E.O"}
{"index":{}}
{"description":"someone is c.e.o"}
{"index":{}}
{"description":"someone is c.e.o."}
{"index":{}}
{"description":"someone is ceo"}

POST my-index-000001/_search 
{
  "query": {
    "match": {
      "description": "ceo"
    }
  }
}

revencu · November 16, 2022, 9:31am

I can't make a mapping because I have a huge index

 "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "remove_dots"
      }
    }
  }

So, I use this request but it not return values with dots

                    {
                         "match":{
                              "title": {
                                   "query":'ceo',
                                   "analyzer": "remove_dots"
                              }
                         },
                    },

RabBit_BR · November 16, 2022, 9:47am

OK. In that case I think you have to think otherwise.
Your document token looks something like this: "ceo", "Ceo", "CEO", "c.e.o.", "C.E.O.", "c.e.o".
As you cannot apply an analyzer at indexing time you have to add it at search time.
It would be better for you to add having the analyzer to "." in the search terms so you can get the match in "c.e.o.", "C.E.O.", "c.e.o".

Your query would have to be a should clausule because you will use the match for the term "ceo" and another match for the term "ceo" but with the analyzer that adds the ".".

revencu · November 16, 2022, 9:56am

Is any way to find in field that before removing all dots without script query?

RabBit_BR · November 16, 2022, 10:26am

By the scenario I don't know any other solution besides the ones presented, but maybe someone knows something new.

revencu · November 16, 2022, 10:28am

Thank you, but this solution not work on my side

dadoonet · November 16, 2022, 11:31am

The best option is to reindex your dataset. You can look at the reindex API for this.

revencu · November 16, 2022, 11:43am

Yes, make sense, but I cannot do this for a huge index (700M records).

system · December 14, 2022, 11:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Club char_filter for a regex pattern and synonyms in the same query Elasticsearch	1	152	July 17, 2023
Serbian analyzer setup Elasticsearch	4	2486	November 21, 2017
Search string in a text Kibana	8	6210	July 11, 2018
ElasticSearch 5.3 filterer char_filter. pattern_replace not working Elasticsearch	5	1201	August 29, 2017
Why match phrase searching is needed when you want to see the results of an analyzer in kibana? Kibana	2	511	December 7, 2017

Search for documents containing words with and without dots (e.g. “ceo”, “c.e.o”, “c.e.o.”)

Related topics