Whats the best way to query exact match and full text at the same time?

Hi all,

I had an ES search running on ES 2 that was working as I hoped but since upgrading to ES5 and subsequently to ES 6 the relevance has degraded somewhat (probably due to poor DSL I guess). Its just a shame that it was working nicely and now isn't.

So we have a number of documents that contain some codes such as '1/2' or '000045/000009' as well lots of bespoke screen text (from a third party application).

We have a single 'search everything' input box and go button, a la Google, which has stopped providing the relevance we got used to with ES2.

As part of the upgrade we changed the Code field to be stored as a keyword and the screen text is still stored as text (there's a lot of this; lots of names and addresses mostly, amongst other types of data).

I was hoping someone might be able to help shed light on why the query below isn't returning a document with a code exactly matching '1/2' as the first hit? I get about 140 hits before it where the is a number 1 or 2 somewhere on a piece of screen data..?

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "systemId": [
              1,
              2
            ]
          }
        }
      ],
      "minimum_should_match": 1,
      "should": [
        {
          "multi_match": {
            "fields": [
              "code",
              "code.standard",
              "name",
              "notes",
              "notes.english",
              "shortName"
            ],
            "query": "1/2",
            "type": "best_fields"
          }
        },
        {
          "nested": {
            "path": "matterKeyDates",
            "query": {
              "multi_match": {
                "fields": [
                  "matterKeyDates.notes",
                  "matterKeyDates.notes.english"
                ],
                "query": "1/2",
                "type": "best_fields"
              }
            }
          }
        },
        {
          "nested": {
            "path": "screenData",
            "query": {
              "multi_match": {
                "fields": [
                  "screenData.valueString",
                  "screenData.valueString.english"
                ],
                "query": "1/2",
                "type": "best_fields"
              }
            }
          }
        }
      ]
    }
  },
  "size": 10,
  "_source": {
    "excludes": [
      "screenData.*"
    ]
  }
}

Any help appreciated ... :smile:

So I thought I'd expand the problem with a replication script...

PUT tests

PUT tests/test/_mapping
{
  "test" : {
    "dynamic" : "false",
    "date_detection" : false,
    "numeric_detection" : false,
    "properties" : {
      "code" : {
        "type" : "keyword"
        },
        "screenData" : {
          "type" : "nested",
          "dynamic" : "false",
          "properties" : {
            "valueString" : {
              "type" : "text"
          }
        }
      }
    }
  }
}

PUT tests/test/1
{
  "code": "1/1",
  "screenData": [
    {
    "valueString": "1 smith road"
    },
    {
    "valueString": "2 smith road"
    }
  ]
}
PUT tests/test/2
{
  "code": "1/2",
  "screenData": [
    {
    "valueString": "3 smith road"
    },
    {
    "valueString": "4 smith road"
    }
  ]
}

GET tests/_search
{
      "query": {
        "bool": {
          "minimum_should_match": 1,
          "should": [
            {
              "multi_match": {
                "fields": [
                  "code"
                ],
                "query": "1/2"
              }
            },
            {
              "nested": {
                "path": "screenData",
                "query": {
                  "multi_match": {
                    "fields": [
                      "screenData.valueString",
                      "screenData.valueString.english"
                    ],
                    "query": "1/2"
                  }
                }
              }
            }
          ]
        }
      },
      "size": 10,
      "_source": {
        "excludes": [
          "screenData.*"
        ]
      }
    }

In this example document 2 scores higher than document 1. I would expect the keyword exact match to score higher than hits on the screenData..

I'd prefer to not start boosting here and there and am sure there should be a way to better write the query to ensure exact matches on keywords score higher than full text matches.

Any ideas?

You can use this in your query to look at the details of scores:

   "explain": true, 

Using that I can see that doc 1 scores highly because it matches two tokens 1 and 2 whereas document 2 only matches one token 1/2. All tokens are of equal rarity and therefore value.

The solution is not for me to recommend a fix for this single scenario. You need to come up with a search template for handling user input that performs well across a whole range of queries. What fixes one query may be detrimental to many others.
Querying across multiple fields is one of the more complex things to get right in a search application and whole books have been written on the subject. The explain feature is your friend on this journey

Thanks Mark

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.