Query String with Wildcard not giving exact matches in the result set first

If a Query is fired to Elastic Search as shown below

GET purchase_order/_search
{
  "query": {
    "query_string": {
      "query": "*HEAVENLUXE-SG23FEB2022*", 
      "fields": [
        "purchaseOrderNumber^4"
      ]
    }
  }
}

The Response does not have documents with exact matches higher up on the result set.

{
        "_index": "una_purchase_order",
        "_id": "MRrfbocB80yW8v-BTd1F",
        "_score": 4,
        "_source": {
          "@version": "1",
          "createdAt": "2023-03-16T11:14:47.418Z",
          "@timestamp": "2023-04-11T05:52:20.655257230Z",
          "purchaseOrderNumber": "HEAVENLUXE-SG23FEB2022BATCH2",
          "partNumber": "GREYFSS-Q"
        }
      }

Welcome!

Yes.

What is exactly your question?

I need my query to return exact matches higher up in the search results.
Only If we are not able to find exact matches, partially matches results should appear in the result set.

Ok. First of all, you should not use wildcards unless you don't really care about performance.

To answer your question, you could use a bool query like this:

POST _search
{
  "query": {
    "bool" : {
      "should" : [
        { "wildcard" : { "purchaseOrderNumber" : { "value": "*HEAVENLUXE-SG23FEB2022*" } } },
        { "match" : { "purchaseOrderNumber" : "HEAVENLUXE-SG23FEB2022" } }
      ]
    }
  }
}

But again read: Wildcard query | Elasticsearch Guide [8.11] | Elastic

Avoid beginning patterns with * or ? . This can increase the iterations needed to find matching terms and slow search performance.

Thanks for this.
One thing that I also forgot to mention is that this search needs to happen across all the fields in the index.
Not only that, the search will take place across multiple indices.
Every Index, contains only a few fields (Maximum 5) with Keyword like Data.

I also need the search query to support case insensitive search.

I am okay with the performance implications of wildcard & _all

Then use a query string query without specifying the fields you want to search on. But again, this will run slowly. It'd be better to explore other strategies like using edge ngram or ngrams or the wildcard data type. I can give more advices if you explain a bit your exact use case (the business use case).

I have the feeling here that you are trying to switch your search from a SQL database, using the same patterns that you learned in SQL.
A search engine works a bit differently so trying to mimic the SQL behavior might not be the best option, although it could work.

My 2 cents.

1 Like

So I have data related to Products, Orders & Purchase Orders in their own Relational Databases.

Each type of Data i.e. Products, Orders etc have about 4-5 Fields, based on which an end user would perform Search (Search has to be across Products, Orders and Purchase Order).

We have a global search feature( search box ) which must return relevant results to the search term provided by the User.

The User could potentially input search term with uppercase or lowercase, but the search engine must account for it.

Not only this, the search engine must also give results based on partial search term entered by the user. This is to say that user should be able to enter partial search term(orderId, productName, PO Number) and still get hits.

To Address this use case, I have created 3 indices for Product, Orders and Purchase Orders.

The Mapping For Each Index Looks similar to this

{
  "settings": {
    "analysis": {
      "analyzer": {
        "lowercasespaceanalyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase"
        ]
      }
      }
    }
  },
  "mappings": {
    "properties": {
      "purchaseOrderNumber": {
        "type":"text",
        "analyzer": "lowercasespaceanalyzer"
      },
      "partNumber": {
        "type":"text",
        "analyzer": "lowercasespaceanalyzer"
      },
      "createdAt": {
        "type": "date",
        "format": "strict_date_optional_time"
      }
    }
  }
}

And the query used is

{
  "query": {
    "query_string": {
      "query": "*HEAVENLUXE-SG23FEB2022*", 
      "fields": [
        "modelName^2",
        "shortName^4",
        "channelSkuId^3",
        "unaSkuId^1",
        "optionalSellerOrderId^3",
        "internalOmsOrderId^1",
        "sellerOrderId^4",
        "partNumber^4",
        "purchaseOrderNumber^3"
      ]
    }
  }
}

This was working mostly, except for the fact that wildcard used in the query_string was yielding results as per substring matches in the top result set.

I need the exact matches to come higher up on the result set. If and only if there is no exact match should substring results be included in the result set.

@dadoonet Do you have any advice ? I could provide more information if required.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.