TermsAggregation : sort by score to get relavant document on top - Please help

I have an index which contain documents with same employee name and email address but varies with other information such as meetings attended and amount spent.

{
   "emp_name" : "Raju",
   "emp_email" : "raju@abc.com",
   "meeting" : "World cup 2019",
   "cost" : "2000" 
}

{
   "emp_name" : "Sanju",
   "emp_email" : "sanju@abc.com",
   "meeting" : "International Academy",
   "cost" : "3000" 
}

{
   "emp_name" : "Sanju",
   "emp_email" : "sanju@abc.com",
   "meeting" : "School of Education",
   "cost" : "4000" 
}

{
   "emp_name" : "Sanju",
   "emp_email" : "sanju@abc.com",
   "meeting" : "Water world",
   "cost" : "1200" 
}

{
   "emp_name" : "Sanju",
   "emp_email" : "sanju@abc.com",
   "meeting" : "Event of Tech",
   "cost" : "5200" 
}

{
   "emp_name" : "Bajaj",
   "emp_email" : "bajaju@abc.com",
   "meeting" : "Event of Tech",
   "cost" : "4500" 
}

Now, when I do search based on emp_name field like "raj" then I should get one of the Raju, Sanju and Bajaj document since I am using fuzzy search functionality (fuzziness(auto)).

I am implementing elasticsearch using Java High level rest client 6.8 API.

TermsAggregationBuilder termAggregation = AggregationBuilders.terms("employees")
            .field("emp_email.keyword")
            .size(2000);

    TopHitsAggregationBuilder termAggregation1 = AggregationBuilders.topHits("distinct")
            .sort(new ScoreSortBuilder().order(SortOrder.DESC))
            .size(1)
            .fetchSource(includeFields, excludeFields);

Based on the above code, it's getting distinct documents but Raju's record is not on the top of the response instead we see Sanju document due to the number of counts.

Below is the JSON created based on the searchrequest.

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "raj",
            "fields": [
              "emp_name^1.0",
              "emp_email^1.0"
            ],
            "boost": 1.0
          }
        }
      ],
      "filter": [
        {
          "range": {
            "meeting_date": {
              "from": "2019-12-01",
              "to": null,
              "boost": 1.0
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  },
  "aggregations": {
    "employees": {
      "terms": {
        "field": "emp_email.keyword",
        "size": 2000,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "distinct": {
          "top_hits": {
            "from": 0,
            "size": 1,
            "version": false,
            "explain": false,
            "_source": {
              "includes": [
                "all_uid",
                "emp_name",
                "emp_email",
                "meeting",
                "country",
                "cost"
              ],
              "excludes": [

              ]
            },
            "sort": [
              {
                "_score": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

I think if we order by max_score or _score then Raju's record will be on top of the response.

Could you please let me know how to get order by _score or max_score of the document returned by response?

Sample response is

{
  "took": 264,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 232,
    "max_score": 0.0,
    "hits": [

    ]
  },
  "aggregations": {
    "sterms#employees": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Sanju",
          "doc_count": 4,
          "top_hits#distinct": {
            "hits": {
              "total": 4,
              "max_score": 35.71312,
              "hits": [
                {
                  "_index": "indexone",
                  "_type": "employeedocs",
                  "_id": "1920424",
                  "_score": 35.71312,
                  "_source": {
                    "emp_name": "Sanju",
                      ...
                   }
            }
          ]
        }
      }
    }, 
    {
          "key": "Raju",
          "doc_count": 1,
          "top_hits#distinct": {
            "hits": {
              "total": 1,
              "max_score": 89.12312,
              "hits": [
                {
                  "_index": "indexone",
                  "_type": "employeedocs",
                  "_id": "1920424",
                  "_score": 89.12312,
                  "_source": {
                    "emp_name": "Raju",
                      ...
                   }
            }
          ]
        }
      }
    }

Let me know if you have any question.

Note: I see many similar kind of questions but none of them helped me. Please advise.

Thanks, Chetan

Hi, I don't see any response. Please let me know if any details required or any confusion with my question.

If I understand correctly you want to sort the terms agg's name values by top score so you'll need something like this:

GET /indexone/_search
{
  "query": {
	"match": {
	  "emp_name": "raju"
	}
  },
  "size": 0,
  "aggs": {
	"topCos": {
	  "terms": {
		"field": "emp_email.keyword",
		"order": {
		  "topscore": "desc"
		}
	  },
	  "aggs": {
		"topscore": {
		  "max": {
			"script": {
			  "source": "_score"
			}
		  }
		},
		"hits": {
		  "top_hits": {
			"size": 2
		  }
		}
	  }
	}
  }
}

@Mark_Harwood, It worked, thank you very much.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.