Difficulty in formulating query for my usecase


(NM) #1

Note: We are not using elasticsearch's parent-child concept.

Below is my JSON Doc inside elasticsearch.

{
        "_index": "in22",
        "_type": "event",
        "_source": {
            "Title": "Jurassic World",
            "Language": [
                "English"
            ],
            "inner_hits": [
                {
                    "Code": "ET00009709",
                    "IsDefault": "",
                    "Language": "English",
                    "Format": "3D",
                    "Region": "MUMBAI"
                },
                {
                    "Code": "ET00009710",
                    "IsDefault": "Y",
                    "Language": "English",
                    "Format": "2D",
                    "Region": "CHEN"
                },
                {
                    "Code": "ET00009713",
                    "IsDefault": "",
                    "Language": "Hindi",
                    "Format": "2D",
                    "Region": "MUMBAI"
                },
                {
                    "Code": "ET00009714",
                    "IsDefault": "",
                    "Language": "Tamil",
                    "Format": "3D",
                    "Region": "MUMBAI"
                },
                {
                    "Code": "ET00009715",
                    "IsDefault": "",
                    "Language": "Hindi",
                    "Format": "3D",
                    "Region": "MUMBAI"
                },
                {
                    "Code": "ET00009716",
                    "IsDefault": "",
                    "Language": "Bengali",
                    "Format": "2D",
                    "Region": "MUMBAI"
                }
            ]
        }
    }

And all I wanted to know is this, can i build a query to output only those block inside inner_hits which contains region "MUMBAI",
for example
inner_hits should contain this

   {
                "Code": "ET00009709",
                "IsDefault": "",
                "Language": "English",
                "Format": "3D",
                "Region": "MUMBAI"
}

and should not contain

 {
                "Code": "ET00009710",
                "IsDefault": "Y",
                "Language": "English",
                "Format": "2D",
                "Region": "CHEN"
}

Is this possible?


(Britta Weber) #2

If you don't want to use parent/child, you could still use nested documents (see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html). Below is a small example. Is that what you mean?

PUT testidx
{
  "mappings": {
    "doc": {
      "properties": {
        "nested_docs": {
          "type": "nested"
        }
      }
    }
  }
}

POST testidx/doc
{
  "Title": "Jurassic World",
  "Language": [
    "English"
  ],
  "nested_docs": [
    {
      "Code": "ET00009709",
      "IsDefault": "",
      "Language": "English",
      "Format": "3D",
      "Region": "MUMBAI"
    },
    {
      "Code": "ET00009710",
      "IsDefault": "Y",
      "Language": "English",
      "Format": "2D",
      "Region": "CHEN"
    }
  ]
}
POST testidx/_search
{
  "fields": [], 
  "query": {
    "nested": {
      "path": "nested_docs",
      "query": {
        "match": {
          "Region": "MUMBAI"
        }
      },
      "inner_hits": {}
    }
  }
}

would result in:

  {
   "took": 6,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1.4054651,
      "hits": [
         {
            "_index": "testidx",
            "_type": "doc",
            "_id": "AU-oqm6l8HuUQps-lae-",
            "_score": 1.4054651,
            "inner_hits": {
               "nested_docs": {
                  "hits": {
                     "total": 1,
                     "max_score": 1.4054651,
                     "hits": [
                        {
                           "_index": "testidx",
                           "_type": "doc",
                           "_id": "AU-oqm6l8HuUQps-lae-",
                           "_nested": {
                              "field": "nested_docs",
                              "offset": 0
                           },
                           "_score": 1.4054651,
                           "_source": {
                              "Code": "ET00009709",
                              "IsDefault": "",
                              "Language": "English",
                              "Format": "3D",
                              "Region": "MUMBAI"
                           }
                        }
                     ]
                  }
               }
            }
         }
      ]
   }
}

(NM) #3

Ok, First of all Thanks, we are now one step closer to our goal, but the query you mentioned produces only 3 nested doc instead of 5, Let see it, Original JSON DOC in elasticsearch

      {
    "_index": "in22",
    "_source": {
        "Language": [
            "English"
        ],
        "Tags": [
            "MT",
            "MUMBAI",
            "CHEN"
        ],
        "_boost": 3,
        "inner_hits": [
            {
                "Code": "ET00009709",
                "IsDefault": "",
                "Language": "English",
                "Format": "3D",
                "Region": "MUMBAI"
            },
            {
                "Code": "ET00009710",
                "IsDefault": "Y",
                "Language": "English",
                "Format": "2D",
                "Region": "CHEN"
            },
            {
                "Code": "ET00009713",
                "IsDefault": "",
                "Language": "Hindi",
                "Format": "2D",
                "Region": "MUMBAI"
            },
            {
                "Code": "ET00009714",
                "IsDefault": "",
                "Language": "Tamil",
                "Format": "3D",
                "Region": "MUMBAI"
            },
            {
                "Code": "ET00009715",
                "IsDefault": "",
                "Language": "Hindi",
                "Format": "3D",
                "Region": "MUMBAI"
            },
            {
                "Code": "ET00009716",
                "IsDefault": "",
                "Language": "Bengali",
                "Format": "2D",
                "Region": "MUMBAI"
            }
        ]
    }
}

So as we can see there are 6 sub-doc within inner_hits and out of that 5 has Region=MUMBAI so our query should produce all 5 of them, but with your query result i got is


(NM) #4

Pasting in separate post, as body limit of previous post has been exhausted.

 {
       ...
       
        "_shards": {
           ...
        },
        "hits": {
            "total": 2,
            "max_score": 28.32935,
            "hits": [
                {
                   ...
                    "fields": {
                        "Title": [
                            "Jurassic World"
                        ],
                        "Status": [
                            "NS"
                        ]
                    },
                    "inner_hits": {
                        "inner_hits": {
                            "hits": {
                                "total": 5,
                                "max_score": 25.036417,
                                "hits": [
                                    {
                                        "_index": "in22",
                                        "_type": "event",
                                        "_id": "ET00009709",
                                        "_nested": {
                                            "field": "inner_hits",
                                            "offset": 5
                                        },
                                        "_score": 25.036417,
                                        "_source": {
                                            "Code": "ET00009716",
                                            "IsDefault": "",
                                            "Language": "Bengali",
                                            "Format": "2D",
                                            "Region": "MUMBAI"
                                        }
                                    },
                                    {
                                        "_index": "in22",
                                        "_type": "event",
                                        "_id": "ET00009709",
                                        "_nested": {
                                            "field": "inner_hits",
                                            "offset": 4
                                        },
                                        "_score": 25.036417,
                                        "_source": {
                                            "Code": "ET00009715",
                                            "IsDefault": "",
                                            "Language": "Hindi",
                                            "Format": "3D",
                                            "Region": "MUMBAI"
                                        }
                                    },
                                    {
                                        "_index": "in22",
                                        "_type": "event",
                                        "_id": "ET00009709",
                                        "_nested": {
                                            "field": "inner_hits",
                                            "offset": 3
                                        },
                                        "_score": 25.036417,
                                        "_source": {
                                            "Code": "ET00009714",
                                            "IsDefault": "",
                                            "Language": "Tamil",
                                            "Format": "3D",
                                            "Region": "MUMBAI"
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }

So I was wondering how to produces all 5 document which matches Region=MUMBAI


(Britta Weber) #5

You have to configure the size parameter as described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html#_options


(NM) #6

Ok, thanks again. But there could be any number of nested doc say 5, 7, n, We want to display n if hits.total is n.
I Hope you understand my usecase.
If not please ask.


(Britta Weber) #7

Hm. I was about to answer: "Well, just set it to INT_MAX" but when I tried it out that actually results in an oom...I opened an issue here: https://github.com/elastic/elasticsearch/issues/13394 . Once this is fixed you should be able to just get all nested docs by setting size to some ridiculously high number.


(NM) #8

Thanks again. It will be great if you could explain me how this works end-to-end, like how this query work exactly, because i will be plug-in this query with my initial query which looks like this

$query = [
        "filtered" => [
            "query" => [
                "bool" => [
                    "should" => [
                        [
                            'query_string' => [
                                'fields' => [
                                    'Title.title^4',
                                    'Title.ngrams_front^2',
                                    'Title.ngrams_back'
                                ],
                                'defaultOperator' => 'or',
                                'analyzer' => 'titles_default_analyzer',
                                'query' => $paramsObj->q
                            ]
                        ],
                        [
                            'fuzzy' => [
                                'Title.title' => [
                                    'value' => $paramsObj->q,
                                    'boost' => 1,
                                    'min_similarity' => 0.5,
                                    'max_expansions' => 20,
                                    'prefix_length' => 0
                                ]
                            ]
                        ]
                    ]   
                ]
            ],

            "filter" => $filters
        ]	
    ];

(system) #9