How to get array size for each document

I want the size of an array of objects ('breaches' here) which is present in each document to be returned along with the fields present in the document once the query is executed. How can I do this? Below is my document structure and mapping. I tried with inline script but did not get success. I want name, age, email, contact, total number of breaches to be returned for each document.

{
	"age": 45,
	"email": "",
	"contact": 9899999999,
	"name": "xyx",
	"breaches": [{
		"_id": "5e833cae56cd2e0011ee8e4a",
		"citizen": "5e82e0f5fb4f240012fa9737",
		"breachType": "location_disabled",
		"timestamp": "2020-03-31T12:50:52.828Z",
		"division": null,
		"createdAt": "2020-03-31T12:50:54.371Z"
	}]
}

Mapping

{
	"properties": {
		"age": {
			"type": "long"
		},
		"email": {
			"type": "text",
			"fields": {
				"keyword": {
					"type": "keyword",
					"ignore_above": 256
				}
			}
		},
		"contact": {
			"type": "long"
		},
		"name": {
			"type": "text",
			"fields": {
				"keyword": {
					"type": "keyword",
					"ignore_above": 256
				}
			}
		},
		"breaches": {
			"type": "nested",
			"properties": {
				"_id": {
					"type": "text",
					"fields": {
						"keyword": {
							"type": "keyword",
							"ignore_above": 256
						}
					}
				},
				"breachType": {
					"type": "text",
					"fields": {
						"keyword": {
							"type": "keyword",
							"ignore_above": 256
						}
					}
				},
				"citizen": {
					"type": "text",
					"fields": {
						"keyword": {
							"type": "keyword",
							"ignore_above": 256
						}
					}
				},
				"createdAt": {
					"type": "date"
				},
				"division": {
					"type": "text",
					"fields": {
						"keyword": {
							"type": "keyword",
							"ignore_above": 256
						}
					}
				}
			}
		}
	}
}

Hello @ntsh999,

The "easiest" solution is using scripted fields, but it is gonna to be slow and consume resources, as you're accessing to the _source:

GET your_index_name/_search
{
  "script_fields": {
    "number_of_breaches": {
      "script": {
        "source": "params['_source'].breaches.length"
      }
    }
  },
  "_source": {
    "excludes": [ "breaches" ]
  }
}
# Response
{
  "took" : 672,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "mytest",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "contact" : 9899999999,
          "name" : "xyx",
          "age" : 45,
          "email" : ""
        },
        "fields" : {
          "number_of_breaches" : [
            1
          ]
        }
      }
    ]
  }
}

The best solution is usually to compute such fields at indexing time, for example using the Ingest Pipelines (documentation):

PUT _ingest/pipeline/count_breaches
{
    "description": "Counts the breaches",
    "processors": [
      {
        "script": {
          "source": "ctx.number_of_breaches = ctx.containsKey('breaches') ? ctx.breaches.length : 0"
        }
      }
    ]
}

The pipeline can be used when indexing the document.

3 Likes

HI @Luca_Belluccini,
Thanks the solution provided by you works absolutely fine.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.