Sort results based on the number of matched nested documents


(Dzmitry Kavalionak) #1

I have the following index

PUT /index/blogpost/2
{
  "title": "Investment secrets",
  "body":  "What they don't tell you ...",
  "tags":  [ "shares", "equities" ],
  "comments": [
    {
      "name":    "Mary Brown",
      "comment": "Lies, lies, lies",
      "age":     42,
      "stars":   1,
      "date":    "2014-10-18"
    },
    {
      "name":    "John Smith",
      "comment": "You're making it up!",
      "age":     28,
      "stars":   2,
      "date":    "2014-10-16"
    }
  ]
}

Is it possible to sort posts based on the count of comments in October like this

GET /_search
{
  "query": {
    "nested": { 
      "path": "comments",
      "filter": {
        "range": {
          "comments.date": {
            "gte": "2014-10-01",
            "lt":  "2014-11-01"
          }
        }
      }
    }
  },
  "sort": {
    "comments": { 
      "order": "desc",   
      "mode":  "doc_count",   
      "nested_filter": { 
        "range": {
          "comments.date": {
            "gte": "2014-10-01",
            "lt":  "2014-11-01"
          }
        }
      }
    }
  }
}

I know that there is no "doc_count" sort mode in elasticsearch and the query above is invalid.

Is there a way to sort posts by the number of comments in October?


(Christoph) #2

Hi,

have you tried script sort? Something along these lines might work:

"sort": {
    "_script": {
      "script": "doc['comments'].values.length",
      "order": "desc",
      "type": "string"
    }
  }

I haven't tried it myself because I don't have your data but it might be a starting point. On the other hand I'm not sure it it does the range filtering on the nested docs or would just score by total count of comments, regardless or range... Need to do some digging here.


(Dzmitry Kavalionak) #3

@cbuescher thank you very much for the help! When I try this query

curl -XGET localhost:9200/posts/_search -d '
{
  "query": {
    "nested": { 
      "path": "comments",
      "filter": {
        "range": {
          "comments.date": {
            "gte": "2014-10-01",
            "lt":  "2014-11-01"
          }
        }
      }
    }
  },
  "sort": {
    "_script": {
      "script": "doc[\"comments\"].values.length",
      "order": "desc",
      "type": "string"
    }
  }
}' 

I get the following error:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 3,
    "failed" : 2,
    "failures" : [ {
      "shard" : 2,
      "index" : "posts",
      "node" : "hOLwx1QZQX-r-_2lyLBACQ",
      "reason" : {
        "type" : "script_exception",
        "reason" : "failed to run inline script [doc[\"comments\"].values.length] using lang [groovy]",
        "caused_by" : {
          "type" : "illegal_argument_exception",
          "reason" : "No field found for [comments] in mapping with types []"
        }
      }
    } ]
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

I googled this error and could not find the solution. Do you have any ideas?

I have the following mappings:

curl -XPOST localhost:9200/posts -d '
{
  "mappings": {
    "post": {
      "properties": {
        "title": { "type": "string" },
        "body": { "type": "string" },
        "tags": { "type": "string" },
        "comments": {
          "type": "nested",
          "properties": {
            "name": { "type": "string" },
            "comment": { "type": "string" },
            "age": { "type": "integer" },
            "stars": { "type": "integer" },
            "date": { "type": "date" }
          }
        }
      }
    }
  }
}'

(Christoph) #4

Hi,

Sorry for the late reply, it took me some time to wrap my head around this as well. My initial idea with the script sort was misleading because, as you saw, acessing nested fields from the parent document context in scripts is not possible. The simplest solution in your case would be to (ab)use the score_mode of the nested query, which lets you control how matching nested docs affect the score of the parent. Since your range query is already in a filter, all matching nested docs have score 1 and by setting score_mode to sum you can get the count of all matching documents.

Also, if you only want to retrieve only the matching nested documents for your range, you can also use the nested inner_hits part. That way, the "inner_hits" section for each parent document should only contain the comments matching your range:

GET /posts/_search
{
  "query": {
    "nested": { 
      "path": "comments",
      "filter": {
        "range": {
          "comments.date": {
            "gte": "2014-10-01",
            "lt":  "2014-11-01"
          }
        }
      },
      "score_mode":  "sum",
      "inner_hits" : {}
    }
  }
}

(system) #5