Inner_hits, top_hits, span !? on nested field performance


(b789) #1

hi,
i've this query:

{
  "query": {
"bool": {
  "must": [
    {
      "bool": {
        "should": [
          {
            "nested": {
              "path": "nestedfield",
              "query": {
                "bool": {
                  "should": [
                    {
                      "match": {
                        "nestedfield.f1": {
                          "query": "term1 term2 term3",
                          "minimum_should_match": "2"
                        }
                      }
                    },
                    {
                      "function_score": {
                        "query": {
                          "match": {
                            "nestedfield.f1": {
                              "query": "term1 term2 term3",
                              "fuzziness": "auto",
                              "minimum_should_match": "2",
                              "prefix_length": "3"
                            }
                          }
                        },
                        "boost": 1,
                        "score_mode": "first",
                        "boost_mode": "multiply",
                        "functions": [
                          {
                            "weight": 2,
                            "filter": {
                              "span_near": {
                                "clauses": [
                                  {
                                    "span_multi": {
                                      "match": {
                                        "fuzzy": {
                                          "nestedfield.f1": {
                                            "value": "term1",
                                            "fuzziness": "auto"
                                          }
                                        }
                                      }
                                    }
                                  },
                                  {
                                    "span_multi": {
                                      "match": {
                                        "fuzzy": {
                                          "nestedfield.f1": {
                                            "value": "term2",
                                            "fuzziness": "auto"
                                          }
                                        }
                                      }
                                    }
                                  },
                                  {
                                    "span_multi": {
                                      "match": {
                                        "fuzzy": {
                                          "nestedfield.f1": {
                                            "value": "term3",
                                            "fuzziness": "auto"
                                          }
                                        }
                                      }
                                    }
                                  }
                                ],
                                "slop": 2,
                                "in_order": true
                              }
                            }
                          }
                        ]
                      }
                    }
                  ],
                  "minimum_should_match": "1"
                }
              },
              "inner_hits": {
                "name": "nestedfield_hit",
                "size": 1,
                "sort": [
                  {
                    "_score": {
                      "order": "desc"
                    }
                  }
                ]
              }
            }
          }
        ]
      }
    }
  ]
}
  },
  "from": 0,
  "size": 10,
  "aggs": {
"name": {
  "nested": {
    "path": "nestedfield"
  },
  "aggs": {
    "name": {
      "terms": {
        "field": "nestedfield.f2"
      },
      "aggs": {
        "name": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}
  }
}

the problem is that the performance aren't good enough.
the thing that i don't understand is why removing one of the following parts improve the performances by ten times.
To obtain this i can remove the inner_hits in the aggregations, the top_hits on the nested query or span queries in the functions scores.

I cant see why removing one of this parts lead to a ten time improvement on the performances.
i get that removing a part of the query will reduce the work to do but in this situation is faster to send a second query just for aggregations and i can't understand why.

I suspect that there some hidden relation between the three parts but i don't know where.
A strange thing that i noticed is that the top hits in the aggregation present a inner_hits field with no real value at the same level of the _source, i don't know if this is expected.

the actual query have other parts but this is enough to demonstrate the problem.

p.s. the functions score with the spans in there just to simulate a match_phrase with fuzzy behavior so if ther's a better way i'm open to suggestions.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.