ElasticSearch double nested sorting

Raman_Goyal · June 12, 2016, 3:42am

I have documents which look like this (here is example):

{
"user": "xyz",
"state": "FINISHED",
"finishedTime": 1465566467161,
"jobCounters": {
    "counterGroup": [
        {
            "counterGroupName": "org.apache.hadoop.mapreduce.FileSystemCounter",
            "counter": [
                {
                    "name": "FILE_BYTES_READ",
                    "mapCounterValue": 206509212380,
                    "totalCounterValue": 423273933523,
                    "reduceCounterValue": 216764721143
                },
                {
                    "name": "FILE_BYTES_WRITTEN",
                    "mapCounterValue": 442799895522,
                    "totalCounterValue": 659742824735,
                    "reduceCounterValue": 216942929213
                },
                {
                    "name": "HDFS_BYTES_READ",
                    "mapCounterValue": 207913352565,
                    "totalCounterValue": 207913352565,
                    "reduceCounterValue": 0
                },
                {
                    "name": "HDFS_BYTES_WRITTEN",
                    "mapCounterValue": 0,
                    "totalCounterValue": 89846725044,
                    "reduceCounterValue": 89846725044
                }
            ]
        },
        {
            "counterGroupName": "org.apache.hadoop.mapreduce.JobCounter",
            "counter": [
                {
                    "name": "TOTAL_LAUNCHED_MAPS",
                    "mapCounterValue": 0,
                    "totalCounterValue": 13394,
                    "reduceCounterValue": 0
                },
                {
                    "name": "TOTAL_LAUNCHED_REDUCES",
                    "mapCounterValue": 0,
                    "totalCounterValue": 720,
                    "reduceCounterValue": 0
                }
            ]
        }
    ]
}

}

Now I want to sort this data to get TOP 15 documents on the basis of totalCounterValue where counter.name is FILE_BYTES_READ. I have tried nested sorting on this but no matter which key name I write in counter.name, it is always sorting on the basis of HDFS_BYTES_READ. Can anyone please help me with my query.

{
"_source": true,
"size": 15,
"query": {
    "bool": {
        "must": [
            {
                "term": {
                    "state": {
                        "value": "FINISHED"
                    }
                }
            },
            {
                "range": {
                    "startedTime": {
                        "gte": "now - 4d",
                        "lte": "now"
                    }
                }
            }
        ]
    }
},
"sort": [
    {
        "jobCounters.counterGroup.counter.totalCounterValue": {
            "order": "desc",
            "nested_path": "jobCounters.counterGroup",
            "nested_filter": {
                "nested": {
                    "path": "jobCounters.counterGroup.counter",
                    "filter": {
                        "term": {
                            "jobCounters.counterGroup.counter.name": "file_bytes_read"
                        }
                    }
                }
            }
        }
    }
]

}

I followed nested sorting documentation of ElasticSearch and came up with this query, but I don't know why it is always sorting the totalCounterValue of HDFS_BYTES_READ irrespective of jobCounters.counterGroup.counter.name's value.

Raman_Goyal · June 12, 2016, 9:15am

This is the mapping we are using for jobCounters:

"jobCounters": {
      	"type": "nested",
      	"include_in_parent": true,
        "properties" : {
          "counterGroup": {
           "type": "nested",
      		"include_in_parent": true,
            "properties": {
              "counterGroupName": {
                "type": "string",
        		"fields": {
   					"raw": { 
        				"type":  "string",
        				"index": "not_analyzed"
    					}
    				}
              },
              "counter" : {
					"type": "nested",
      		  		"include_in_parent": true,
              		"properties": {
              			"reduceCounterValue": {
               				"type": "long"
            			},
            			"name": {
                			"type": "string",
        					"analyzer": "english",
							"fields": {
   								"raw": { 
        							"type":  "string",
        							"index": "not_analyzed"
    							}
    						}
            			},
            			"totalCounterValue": {
               				"type": "long"
            			},
          				"mapCounterValue": {
               				"type": "long"
            			}                	
            		}                	
            	}                	 
            }
          }
        }
    }

Topic		Replies	Views
Sorting by an nested object field Elasticsearch	6	2271	July 6, 2017
How to query - sort multiple nested fields in elasticsearch Elasticsearch	1	1060	November 16, 2020
Sort by nested values "across" all documents (duplicate parent documents) Elasticsearch	3	841	February 13, 2020
How do i sorting by buckets index in nested aggregation instead doc_count? Elasticsearch	1	857	January 18, 2019
Sorting by nested fields Elasticsearch	5	1590	July 6, 2017

ElasticSearch double nested sorting

Related topics