Getting results with score 0

I have a very particular use case where I need a subset of documents matching a filter but I also need to score the results in a way that, if the score is zero, these documents aren't removed from the result.

Maybe this is not very elegant, but the idea is that I need 10 documents that will require certain processing and one of them (the one with the highest score) will also require some additional processing.

What I've seen is that, once I set up a query, if the document is zero scored (which happens in 9 out of 10 documents) it will never be retrieved.

I've tried with the min_score setting but it has no effect if it's 0.0 or below.

This looks like a bug to me. Elasticsearch should not filter out documents that have a score of zero. Can you provide us with a simple recreation of the problem?

1 Like

A very simplistic example. Let's say I have an index per day with purchases from different customers:

PUT /purchases-20160513/purchase/1
{
  "CUSTOMER_ID" : 1,
  "AMOUNT"      : 10.0
}

PUT /purchases-20160513/purchase/2
{
  "CUSTOMER_ID" : 1,
  "AMOUNT"      : 0.0
}

I'd like to retrieve all purchases from customer 1, and do something specifically with the purchase closest to $10 (with some threshold)

The thing is, I've assumed that this was a feature; filter is used to trim the results and query is used to score them:

GET /purchases-20160513/purchase/_search
{
    "min_score" : 0.0,
    "query" : {
      "range": {"AMOUNT": {"lt": 11.0, "gte": 9.0}}
    },
    "filter" : {
      "term" : { "CUSTOMER_ID" : 1 }
    }
}

However, this search throws just one result:

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "purchases-20160513",
        "_type": "purchase",
        "_id": "1",
        "_score": 1,
        "_source": {
          "CUSTOMER_ID": 1,
          "AMOUNT": 10
        }
      }
    ]
  }
}

I've tried to set min_score to -1.0 with same results.

If I repeat the search without the query I get both documents:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "purchases-20160513",
        "_type": "purchase",
        "_id": "2",
        "_score": 1,
        "_source": {
          "CUSTOMER_ID": 1,
          "AMOUNT": 0
        }
      },
      {
        "_index": "purchases-20160513",
        "_type": "purchase",
        "_id": "1",
        "_score": 1,
        "_source": {
          "CUSTOMER_ID": 1,
          "AMOUNT": 10
        }
      }
    ]
  }
}

Conclusion: I'm getting only documents fully matching both the query and the filter. Is this the expected behavior?

This is possibly not the cleanest way to achieve it. I could rewrite the whole thing to make two searches (one for picking up the highest scored document and one without the query just for getting the rest of the documents) but I'd like to make this work with just one request if possible. Another option is to do a msearch but is it really necessary?

Thank you,