Nested Mapping Performance & Retrieving "Complete" Nested Objects


(John Harding) #1

I have two questions regarding nested mapping:

  1. If I have millions of documents that each contain a nested array with up to fifty entries in the array and each array entry consisting of an object having up to a dozen properties what sort of impact will this have on performance (compared to not using a nested array)? If the performance is poor is there a better way to organize the index?

Background: we used to store the info that's now in a nested array as named properties of the source document - but we needed to search on the attributes of the named properties without knowing the property name. So, we made the property name it's own property and store this in an array of type "nested". This allows us to do the searching we want. But I'm concerned that this use of nested may require lots of extra storage, memory and processing time.

Old index schema (intrinsics member is an object with named properties):

"some_document" : {
	"system": { /* meta data */ }
	"intrinsics": {
		"prop1": {
			"display": "property 1",
			"value": "uno"
		},
		"prop2": {
			"display": "property 2",
			"value": "duo"
		}
	}
}

New index schema (intrinsics is an array using the mapping type of "nested"):

"some_document" : {
	"system": { /* meta data */ }
	"intrinsics": [
		{
			"field_name": "prop 1",
			"display": "property 1",
			"value": "uno"
		},
		{
			"field_name": "prop 2",
			"display": "property 2",
			"value": "duo"
		}
	}
}
  1. Is it possible to retain the object relationship when pulling back partial data from the source document?

Example:

Suppose I set up an index like this:

POST /movies
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested"
            },
            "locations": {
                "type": "nested"
            }
         }
      }
   }
}

POST /movies/movie
{
   "title": "The Matrix",
   "cast": [
      {
         "firstName": "Keanu",
         "lastName": "Reeves",
         "address": {
            "street": "somewhere",
            "city": "LA"
         }
      },
      {
         "firstName": "Laurence",
         "middleName": "John",
         "lastName": "Fishburne",
         "address": {
            "street": "somewhere else",
            "city": "NYC"
         }
      }
   ],
   "locations": [
       {
           "city": "Nashville",
           "state": "Tennessee",
           "country": "USA"
       },
       {
           "city": "Sydney",
           "state": "New South Wales",
           "country": "Australia"
       }
    ]
}

And then I issue this query

GET /movies/_search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "bool": {
                     "must": [
                        { "term": { "firstName": "laurence"} },
                        { "term": { "lastName": "fishburne"} }
                     ]
                  }
               }
            }
         }
      }
   },
   "fields": [
      "cast.address.city",
      "cast.firstName",
      "cast.middleName",
      "cast.lastName"
   ]
}

I get this result:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "movies",
            "_type": "movie",
            "_id": "AU1JeyBseLgwMCOuOLsZ",
            "_score": 1,
            "fields": {
               "cast.firstName": [
                  "Keanu",
                  "Laurence"
               ],
               "cast.lastName": [
                  "Reeves",
                  "Fishburne"
               ],
               "cast.address.city": [
                  "LA",
                  "NYC"
               ],
               "cast.middleName": [
                  "John"
               ]
            }
         }
      ]
   }
}

Is there a way I can either retrieve one array of "cast" objects? Alternatively is there a way of reliably "re-assembling" the "cast" objects from the separate arrays? (e.g. arrive at the conclusion that one cast member is Keanu Reeves and the other is Laurence John Fishburne)


(John Harding) #2

The second question is resolved. I should be using _source filtering, not using fields. Courtesy of Stack Overflow: https://stackoverflow.com/questions/30217778/elasticsearch-retrieving-nested-objects-not-individual-fields

I'm still wondering about performance impacts of nested objects.


(system) #3