Nested Mapping Performance & Retrieving "Complete" Nested Objects

jdh2550 · May 12, 2015, 7:28pm

I have two questions regarding nested mapping:

If I have millions of documents that each contain a nested array with up to fifty entries in the array and each array entry consisting of an object having up to a dozen properties what sort of impact will this have on performance (compared to not using a nested array)? If the performance is poor is there a better way to organize the index?

Background: we used to store the info that's now in a nested array as named properties of the source document - but we needed to search on the attributes of the named properties without knowing the property name. So, we made the property name it's own property and store this in an array of type "nested". This allows us to do the searching we want. But I'm concerned that this use of nested may require lots of extra storage, memory and processing time.

Old index schema (intrinsics member is an object with named properties):

"some_document" : {
	"system": { /* meta data */ }
	"intrinsics": {
		"prop1": {
			"display": "property 1",
			"value": "uno"
		},
		"prop2": {
			"display": "property 2",
			"value": "duo"
		}
	}
}

New index schema (intrinsics is an array using the mapping type of "nested"):

"some_document" : {
	"system": { /* meta data */ }
	"intrinsics": [
		{
			"field_name": "prop 1",
			"display": "property 1",
			"value": "uno"
		},
		{
			"field_name": "prop 2",
			"display": "property 2",
			"value": "duo"
		}
	}
}

Is it possible to retain the object relationship when pulling back partial data from the source document?

Example:

Suppose I set up an index like this:

POST /movies
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested"
            },
            "locations": {
                "type": "nested"
            }
         }
      }
   }
}

POST /movies/movie
{
   "title": "The Matrix",
   "cast": [
      {
         "firstName": "Keanu",
         "lastName": "Reeves",
         "address": {
            "street": "somewhere",
            "city": "LA"
         }
      },
      {
         "firstName": "Laurence",
         "middleName": "John",
         "lastName": "Fishburne",
         "address": {
            "street": "somewhere else",
            "city": "NYC"
         }
      }
   ],
   "locations": [
       {
           "city": "Nashville",
           "state": "Tennessee",
           "country": "USA"
       },
       {
           "city": "Sydney",
           "state": "New South Wales",
           "country": "Australia"
       }
    ]
}

And then I issue this query

GET /movies/_search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "bool": {
                     "must": [
                        { "term": { "firstName": "laurence"} },
                        { "term": { "lastName": "fishburne"} }
                     ]
                  }
               }
            }
         }
      }
   },
   "fields": [
      "cast.address.city",
      "cast.firstName",
      "cast.middleName",
      "cast.lastName"
   ]
}

I get this result:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "movies",
            "_type": "movie",
            "_id": "AU1JeyBseLgwMCOuOLsZ",
            "_score": 1,
            "fields": {
               "cast.firstName": [
                  "Keanu",
                  "Laurence"
               ],
               "cast.lastName": [
                  "Reeves",
                  "Fishburne"
               ],
               "cast.address.city": [
                  "LA",
                  "NYC"
               ],
               "cast.middleName": [
                  "John"
               ]
            }
         }
      ]
   }
}

Is there a way I can either retrieve one array of "cast" objects? Alternatively is there a way of reliably "re-assembling" the "cast" objects from the separate arrays? (e.g. arrive at the conclusion that one cast member is Keanu Reeves and the other is Laurence John Fishburne)

jdh2550 · May 14, 2015, 2:00pm

The second question is resolved. I should be using _source filtering, not using fields. Courtesy of Stack Overflow: https://stackoverflow.com/questions/30217778/elasticsearch-retrieving-nested-objects-not-individual-fields

I'm still wondering about performance impacts of nested objects.

Topic		Replies	Views
Elasticsearch Nested Objects with inner array of objects mapping performance? Elasticsearch	1	776	December 5, 2017
Nested object aggregation performance issues Elasticsearch	1	544	March 8, 2021
Search Performance Tuning Elasticsearch	4	458	November 1, 2018
Multi-fields vs Nested data type Elasticsearch	2	1816	July 18, 2022
Nested type array size - performance considirations Elasticsearch	1	891	December 17, 2017

Nested Mapping Performance & Retrieving "Complete" Nested Objects

Related topics