Multi-match query with filter condition differences between ES2.1.1 and ES.2.3.3

Hi:
I am running into a problem with multi-match queries with filter conditions on ES 2.3.3. Its not returning the results I am expecting.
on ES 2.1.1 I get the results in the order I want but on ES 2.3.3. I do not.

Java - 1.8.0_45
Oracle linux 6.5

The mapping... :

{
"entity": {
    "properties": {
      "name": {
        "type": "string",
        "fields": {
          "na": {
            "index": "not_analyzed",
            "type": "string"
          },
          "autocomplete": {
            "analyzer": "autocomplete",
            "type": "string"
          }
        }
      },
  "type": {
    "index": "not_analyzed",
    "type": "string"
  }
}
}
}

My Query does a multi_match against the name field

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": {
              "multi_match": {
                "query": "texas court of appeals, 1s",
                "fields": [
                  "name",
                  "name.na",
                  "name.autocomplete"
                ],
                "type": "best_fields"
              }
            },
            "filter": {
              "term": {
                "type": "date"
              }
            }
          }
        },
        {
          "bool": {
            "must": {
              "bool": {
                "must": {
                  "multi_match": {
                    "query": "texas court of appeals, 1s",
                    "fields": [
                      "name",
                      "name.na",
                      "name.autocomplete"
                    ],
                    "type": "best_fields"
                  }
                },
                "filter": {
                  "term": {
                    "type": "court"
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}

While I get the right document back in ES 2.1.1, on ES 2.3.3 I do not . and the matching document is not in top 10 or 20 documents.

If I remove the name.na field from the multi_match query, the query works like expected. If it is just one specific type (court) in the outermost bool query, the query returns the correct document.

So My question is :
Did the multi_match implementation change between versions 2.1.1 and 2.3.3 to cause this behavior ?

Any assistance is much appreciated.

Thanks

Ramdev

I'm not aware of any changes between those two versions, although that doesn't rule it out.

How many documents are in your index? How many shards? This may be a difference due to how the two versions physically placed the documents. Because scores are generated on a per-shard basis, the doc frequency (e.g. how many documents hold the term) are shard-local. When you only have a small number of documents, the placement on physical shards can often affect scoring simply because the shard-local DFs change.

You could try re-running your searches with dfs_query_then_fetch which does a pre-search to find all DFs so that scoring is "global".

Alternatively, perhaps run the explain api to see how/why each document matched, which may give you some clues.

Hi Zachary:
Thanks for the response. Here are the answers to some of the questions you asked :

  1. The index in both instances are in 1 Shard.
  2. There are about 1.7M docs in the index. It is a static index, meaning once docs are indexed, the index is no longer touched. (however, there is indexing going on on the Cluster)

Other than perhaps hardware differences, I am not seeing anything obvious with the setup. However, I will try out the explain API and see why docs are ranked differently. (and also the dfs_query_then_fetch - I am not sure it is affected if there is only 1 shard)

Ah, yeah, with only one shard you can ignore what I said about different DF's and dfs_query_then_fetch... since there is only one shard, everything is "global" for scoring purposes.

I think explain is the way to go from here, that should at least provide some details about how things are being scored.

This is dumbfounding.
So I just for kicks, tried submitting the query multiple times in rapid succession ( well, quickly, perhaps not rapid) using the sense UI. I found an odd behavior. The results flip flop. meaning every 3rd response is the right one, with a higher score.

I add a sort condition to the query to sort by _score (default being descending order) for deterministic results.

I can clearly see the score for the wrong result is lower than the right one.

Does this mean something ?

@polyfractal , a follow up question :

In an INdex (index) with a single Type (type), should a search :

index/_search VS index/type/_search yield different results ?

If it does return different results, how can I explain the behavior ?

Thanks

Ramdev

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.