Aggregate count and max per document query

Hi, I have a mapping similar to

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword"
      }
    }
  }
}

now this field holds an array of messages. What I would like to request is doc by doc want count of values in this field(an array) and get the doc that has max of this count. Is this query a possibility, pls educate me :slight_smile: I see individual max or count aggregations but that doesn't fir my case.

TIA!

Hi @dimalini welcome back!

Perhaps you'll find this solution helpful for your use case. I used it on a sample index similar to the one you mentioned above and it seemed to work fine. Although there are considerations to keep in mind that are mentioned in the solution link.

Documents

 "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "system_logs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "windows",
          "messages": [
            "404",
            "500",
            "200"
          ]
        }
      },
      {
        "_index": "system_logs",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "mac",
          "messages": [
            "404",
            "500",
            "200",
            "404",
            "500",
            "200"
          ]
        }
      },
      {
        "_index": "system_logs",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "macair",
          "messages": [
            "404",
            "500",
            "200",
            "404",
            "500",
            "200",
            "404",
            "500",
            "200"
          ]
        }
      }
    ]
  }
}

Query

GET system_logs/_search
{
  "_source": {
    "excludes": [ "messages" ]
  },
  "script_fields": {
    "number_of_messages": {
      "script": {
        "source": "params['_source'].messages.length"
      }
    }
  }
}

Result

"hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "system_logs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "windows"
        },
        "fields": {
          "number_of_messages": [
            3
          ]
        }
      },
      {
        "_index": "system_logs",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "mac"
        },
        "fields": {
          "number_of_messages": [
            6
          ]
        }
      },
      {
        "_index": "system_logs",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "macair"
        },
        "fields": {
          "number_of_messages": [
            9
          ]
        }
      }
    ]
  }

Thanks Alexis. Always feel great to get timely response from the discuss community.

The script_fields indeed helped me summarise the count across document. That part is solved with this proposal.

I am planning to try soon, but posting here anyway on the second aspect. That is to use this script_fields logic and find the docs (top n docs in desc order) that has the max count in them ?

Regards,
Divya Malini

Hi @dimalini Thank you for your patience. And so glad the first response was helpful. I would say have a look at bucket sort aggregation. You'll be able to get the top n documents based on your provided ordering and you can run this with your query.

Thanks Alexis. Will give this a try.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.