How to implement a search by multiple fields and support whitespace, symbols, case insensitive

I have an index with many fields and I want to be able to search by some of them at the same time, and this search should support partial match (in any position), case insensitive, support some symbols.

Example of my index:

PUT my_index
{
    settings: {
      analysis: {
        analyzer: {
          analyzerCaseInsensitive: {
            filter: ['lowercase'],
            tokenizer: 'keyword',
          },
          default: {
            type: 'keyword',
          },
        },
      },
    },
    mappings: {
      properties: {
        id: {
            type: 'text',
            analyzer: 'analyzerCaseInsensitive'
        },
        email: {
            type: 'text',
            analyzer: 'analyzerCaseInsensitive'
        },
        fullName: {
            type: 'text',
            analyzer: 'analyzerCaseInsensitive'
        }
      }
    }
  }

And then to search I'm using this query:

GET my_index/_search
{
  from: 0,
  size: 10,
  query: {
    bool: {
      must: [
        {
          {
            query_string: {
              query: `*{search}*`,
              fields: ['id', 'email', 'fullName'],
            }
          }
        }
      ]
    }
  }
}

With this query I'm able to search partial match, case insensitive, and some symbols, but I'm not able to search text with whitespace in the middle to get those that match only that literal text.

I will describe some examples of what I want to achieve.

--- FIELDS ---

Id field is an uuidv4, so the format is something like: "6e0baaae-ff10-4812-b27d-ca307560b9e3".

fullName field could include whitespace like, "Jhon Doe".

--- EXPECTED BEHAVIOR---

PUT my_index/_doc/1
{
    "id": "6e0baaae-ff10-4812-b27d-ca307560b9e3",
    "email": "jhon.doe@gmail.com",
    "fullName": "Jhon Doe"
}

PUT my_index/_doc/2
{
    "id": "3b8605f9-1fad-4b25-a061-ab0943f7d55e",
    "email": "jhonny_saur@hotmail.com",
    "fullName": "Scott Doe"
}

So if I search "jho", I expect both, because it's included in the fullName of 1 and in the email of 2.
Search "jhon Doe", I expect only the 1, because it's the only that match the whole text.
Search "nny", I expect the 2, because it's included in the email.

How can I improve my index or query to support this? Thanks

Hi @Juan_Manuel

My suggestion:

if you dont want change your mapping use should clause with multi-match query and query_string

New query:

{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "minimum_should_match": 1, 
      "should": [
        {
          "query_string": {
            "query": "*jho*",
            "fields": [
              "id",
              "email",
              "fullName"
            ]
          }
        },
        {
          "multi_match": {
            "query": "jho",
            "fields": [
              "id",
              "email",
              "fullName"
            ]
          }
        }
      ]
    }
  }
}

I would create a new field for received to receive all values other fields using copy to.

Ex:

"properties": {
      "multi_fields": {
        "type": "text",
        "analyzer": "analyzerCaseInsensitive"
      },
      "id": {
        "type": "text",
        "copy_to": "multi_fields"
      },
      "email": {
        "type": "text",
        "copy_to": "multi_fields"
      },
      "fullName": {
        "type": "text",
        "copy_to": "multi_fields"
      }
    }

Query:

{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "*nny*",
            "fields": [
              "multi_fields"
            ]
          }
        },
        {
          "match": {
            "multi_fields": "nny"
          }
        }
      ]
    }
  }
}

Ohh I think it's working as I expected. I just did a couple of quick tests and it worked!!
Let me do some more tests before closing the question, but I think it's working!

Thanks Rabbit!

1 Like

@RabBit_BR, it's working almost perfect in all the cases I need, but there is only one that it doesn't.

The case is when I have something like this:

{
    "id": "6e0baaae-ff10-4812-b27d-ca307560b9e3",
    "email": "random@gmail.com",
    "fullName": "Jhon Doe"
}
{
    "id": "3b8605f9-1fad-4b25-a061-ab0943f7d55e",
    "email": "jhon@gmail.com",
    "fullName": "Bob White"
}

And the query I'm using is:

{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "*any jhon*",
            "fields": [
              "multi_fields"
            ]
          }
        },
        {
          "match": {
            "multi_fields": "any jhon"
          }
        }
      ]
    }
  }
}

The response of this query contains both documents, because the first one has jhon in the fullName, and the second one has it in the email, but none has the "any" word.

In this case I'm expecting no results, I want to match the whole word.

I don't know if this is possible? I don't mind changing my mappings or settings.

you can try with operator AND in query_string?

 {
          "query_string": {
            "query": "*any jhon*",
            "default_operator": "AND", 
            "fields": [
              "id",
              "email",
              "fullName"
            ]
          }
        }
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.