Strange results when querying nested objects

trex · September 16, 2016, 9:19am

Elasticsearch version: 2.3.3
Plugins installed: no plugin
JVM version: 1.8.0_91
OS version: Linux version 3.19.0-56-generic (Ubuntu 4.8.2-19ubuntu1)

I get strange results when I query nested objects on multiple paths. I want to search for all female with dementia. And there are matched patients among the results. But I also get other diagnoses I'm not looking for, the diagnoses related to these patients.

For example, I also get the following diagnoses despite the fact that I looked only for dementia.

Mental disorder, not otherwise specified
Essential (primary) hypertension

Why is that?
I want to get only female with dementia and don't want other diagnoses.

Client_Demographic_Details contains one document per patient. Diagnosis contains multiple documents per patient. The ultimate goal is to index my whole data from PostgreSQL DB (72 tables, over 1600 columns in total) into Elasticsearch.

Query:

{'query': {
       'bool': {
           'must': [
               {'nested': {
                   'path': 'Diagnosis',
                   'query': {
                       'bool': {
                           'must': [{'match_phrase': {'Diagnosis.Diagnosis': {'query': "dementia"}}}]
                       }  
                   }
               }},
               {'nested': {
                   'path': 'Client_Demographic_Details',
                   'query': {
                       'bool': {
                           'must': [{'match_phrase': {'Client_Demographic_Details.Gender_Description': {'query': "female"}}}]
                       }  
                   }
               }}
           ]
       }
    }}

Results:

{
  "hits": {
    "hits": [
      {
        "_score": 3.4594634, 
        "_type": "Patient", 
        "_id": "72", 
        "_source": {
          "Client_Demographic_Details": [
            {
              "Gender_Description": "Female", 
              "Patient_ID": 72, 
            }
          ], 
          "Diagnosis": [
            {
              "Diagnosis": "F00.0 -  Dementia in Alzheimer's disease with early onset", 
              "Patient_ID": 72, 
            }, 
            {
              "Patient_ID": 72, 
              "Diagnosis": "F99.X -  Mental disorder, not otherwise specified", 
            }, 
            {
              "Patient_ID": 72, 
              "Diagnosis": "I10.X -  Essential (primary) hypertension", 
            }
          ]
        }, 
        "_index": "denorm1"
      }
    ], 
    "total": 6, 
    "max_score": 3.4594634
  }, 
  "_shards": {
    "successful": 5, 
    "failed": 0, 
    "total": 5
  }, 
  "took": 8, 
  "timed_out": false
}

Mapping:

{
  "denorm1" : {
    "aliases" : { },
    "mappings" : {
      "Patient" : {
        "properties" : {
          "Client_Demographic_Details" : {
            "type" : "nested",
            "properties" : {
              "Patient_ID" : {
                "type" : "long"
              },
              "Gender_Description" : {
                "type" : "string"
              }
            }
          },
          "Diagnosis" : {
            "type" : "nested",
            "properties" : {
              "Patient_ID" : {
                "type" : "long"
              },
              "Diagnosis" : {
                "type" : "string"
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1473974457603",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "Jo9cI4kRQjeWcZ7WMB6ZAw",
        "version" : {
          "created" : "2030399"
        }
      }
    },
    "warmers" : { }
  }
}

Mark_Harwood · September 16, 2016, 10:26am

If your document represents a single patient why is Client_Demographic_Details an array? Do you deliberately allow for multiple IDs and genders for a patient?

trex · September 16, 2016, 10:27am

Matched doc on nested will be inside inner hits and rest in source.
This working query from here http://stackoverflow.com/questions/39527012/strange-results-when-querying-nested-objects

{
  "_source": {
    "exclude": [
      "Client_Demographic_Details",
      "Diagnosis"
    ]
  },
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "Diagnosis",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase": {
                      "Diagnosis.Diagnosis": {
                        "query": "dementia"
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        },
        {
          "nested": {
            "path": "Client_Demographic_Details",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase": {
                      "Client_Demographic_Details.Gender_Description": {
                        "query": "female"
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        }
      ]
    }
  }
}

trex · September 16, 2016, 10:28am

No, there is only one patient id per patient.

Mark_Harwood · September 16, 2016, 10:34am

So no need for arrays or "nested" mappings for that field then.

Technically speaking you only need to make the diagnoses "nested" too if your queries will test more than one property of each object each e.g. (psuedo code) description:dementia AND diagnoser:DrSmith. However your example was only querying the diagnosis description so if that's all you do there's no harm in just having a non-nested property for diagnosis.

trex · September 16, 2016, 10:41am

I want to run boolean queries to test multiple properties from multiple nested objects. I plan to have 72 nested objects per patient. Each nested object contains data from the related PostgreSQL table.

So, you are saying that I need to have Client_Demographic_Details not nested and Diagnosis as nested object in the same document, aren't you? What benefits does the solution has against "all nested" objects solution that I have?

Mark_Harwood · September 16, 2016, 11:15am

If there is only ever one Client_Demographic_Details object then that definitely does not need to be "nested".
Even with your diagnoses as non-nested you can happily query for people who have demograpic gender:female AND diagnosis:dementia.
The reason you reach for "nested" is only if you have criteria that needs to test more than one term in individual diagnosis objects. This is the "cross-matching" problem I illustrate in this example [1] using students and their examination results - the query is for people who have an exam result with a particular subject AND grade. Without "nested" the grades and subjects for a person's exams are muddled.
Hope this makes sense.
Because users have a hard time articulating complex nested logic, sometimes a "flat" system is preferable and users may actually want the laxness of cross-matching rather than the stricter controls used for nested.

[1] Proposal for nested document support in Lucene | PPT

trex · September 16, 2016, 11:22am

Ok @Mark_Harwood, I will make Client_Demographic_Details as root object, try it and let you know.

trex · September 16, 2016, 1:29pm

I tried your solution, it works. I agree it looks nicer from the hyrarchy point of view.

Topic		Replies	Views
Elasticsearch OR query with nested objects returns inner_hits not matching the criteria Elasticsearch	2	753	September 18, 2018
Querying Nested Object returns all objects Elasticsearch	3	347	May 10, 2019
Elasticsearch Querying Double Nested Object, Match Multiple Rows in Query Within Parent Elasticsearch	1	417	April 21, 2022
Nested query bug? Elasticsearch	1	379	July 6, 2017
Match nested and search another value inside with more data Elasticsearch	2	374	October 6, 2020

Strange results when querying nested objects

Related topics