Elastic Case Insensitive Search

Hi All,

I have a schema which uses Keywords to store values an example of a document would be something like this:

We aggregate on a number of properties too such as make/model/colour/condition etc

{
  "properties": {
    "bodyStyle": {
      "type": "keyword"
    },
    "colourBase": {
      "type": "keyword"
    },
    "condition": {
      "type": "keyword"
    },
    "fuel": {
      "type": "keyword"
    },
    "make": {
      "type": "keyword"
    },
    "mileage": {
      "type": "integer"
    },
    "model": {
      "type": "keyword"
    },
    "registration": {
      "type": "keyword"
    },
    "transmission": {
      "type": "keyword"
    }
  }
}

I'm looking to implement a free text query, which then will do partial search across a number of fields

So they query could be ?query=ford and it would search across make/model/registration/colour for example and return ford, however this currently would only work if the query was ?query=Ford as we store the value in the make field as Ford, not ford

  • Is it possible for it to convert to lowercase for the purpose of query?
  • Is it possible to do partial matches too, so search ?query=for would do (LIKE '%for%' - how i would do it in SQL)

Effectively tying to create a query something like this:

WHERE LOWER(make) LIKE '%query%'
OR LOWER(model) LIKE '%query%'
OR LOWER(registration) LIKE '%query%'
OR LOWER(colourBase) LIKE '%query%'

Hi Tam,

Keywords will be used for exact search only , try to index your fields as text where you can use analysers to make them lowercase while searching and use ngram to do partial searches or you can use regexp query.

Hope it helped.

Thanks for the response, is there any other practical differences between Keyword and Text?

I.e. can we still aggregate on Text fields

Assume i would need to empty the index, update the schema from Keyword -> Text and then populate all documents again to index

keyword -> support aggregations , exact search
text-> support full text search and can be analysed.

I need the fields to support aggregations and full text search

What alternative approach is there? Create duplicates one Keyword and one text?

If you needs both aggregations & full text search , use same field as both text and keyword.

"samplefield":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
}

Now samplefield is text and samplefield.keyword is keyword for aggregations

Ah interesting, will try this approach - thanks

Will i need to remove all my documents, update the property and then re-import all my documents?

If you want to do case insenitive serach on a keyword field you can add a lowercase normalizer.

1 Like

Also term query has a case_insensitive parameter.. very easy if that is all you need.

See here

6 Likes

Amazing!

Just tested that on a field that worked perfectly, allows me to do case in-sensitive search

Quick follow-up on the regex part of the query

I have a sample query like this, where the passed in query NX19FUP is a registration, it does find and return this vehicle at the top of the list, however it then also returns a number of other random records which wouldn't have NX19FUP contained anywhere in them, in this case i would expect it to return just one record

{
  "size": 25,
  "from": 0,
  "_source": {
    "exclude": [
      "finances",
      "lists",
      "feeds"
    ]
  },
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "clientId": 9
              }
            },
            {
              "terms": {
                "status": [
                  "ACTIVE"
                ]
              }
            }
          ]
        }
      },
      "should": [
        {
          "regexp": {
            "make": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "model": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "registration": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "colourBase": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "colourManufacturer": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "variant": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "transmission": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "series": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        }
      ]
    }
  }
}

I am Confused that is a regex with a leading and trailing wildcard you will certainly get other results with that string in the middle a term search would look like this and be an exact case insensitive search

"term": {
                "make": "NX19FUP",
                "case-insensitive" : "true"
              }

What I'm trying to do is build a generic search box on my website where the user can type in ford focus for example or the reg of a vehicle in this case NX19FUP

What i then do is i split the words in the sentence so ford focus becomes and array of ['ford', 'focus'] then search for those terms across a number of fields

I use the wildcard at the start and the ned to allow partial matches

So if a user searches for ford then i would want it to match all of these:

  • ford
  • abcford
  • abcfordadas

But also if a user searches for something like a reg which is unique in our system then when searching NX19FUP for example, it should only ever find it in the registration field, none of the other fields it searches would contain the text NX19FUP with anything before or after, even if it does a wildcard before and after

Not sure if there is a way to see why elastic thinks it's found a match for NX19FUP on the other fields?

EDIT: Looking at the query in more detail it's because of the should in the query, we have a couple of must filters to filter for a particular customer and the status, then these regexp are in a group of should

What would be the correct way to write this query so it creates and AND/OR group?

{
  "size": 25,
  "from": 0,
  "_source": {
    "exclude": [
      "finances",
      "lists",
      "feeds"
    ]
  },
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "clientId": 9
              }
            },
            {
              "terms": {
                "status": [
                  "ACTIVE"
                ]
              }
            }
          ]
        }
      },
      "should": [
        {
          "regexp": {
            "make": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "model": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "registration": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "colourBase": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "colourManufacturer": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "variant": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "transmission": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        },
        {
          "regexp": {
            "series": {
              "value": ".*NX19FUP.*",
              "case_insensitive": true
            }
          }
        }
      ]
    }
  }
}

(in sql it would be like this)

AND (make LIKE =%{query}% OR model LIKE '%{query}% OR registration LIKE '%{query}%')

So any one of the regex queries should return a result

EDIT 2: I've added a "minimum_should_match": 1 which seems to work, is this the best way of doing it?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.