Nested inner_hits slows query performance


(Jason Cornez) #1

I have a query that is very fast (sub-second) without any inner_hits, but takes 20 - 30 seconds with inner_hits returned. Here is the query

{ "from": 0, "size": 2500, "terminate_after": 2500,
  "query": {
    "constant_score": {
      "filter": { "bool": {"must": [
        {"range": {"timestamp_utc": {
                 "gte": "2018-03-16 00:00:00",
                  "lt": "2018-03-23 00:00:00",
              "format": "yyyy-MM-dd HH:mm:ss"}}},
        {"nested": {
            "path":"analytics",
            "query":
              {"terms": {"analytics.rp_entity_id": ["D8442A","4A6F00","FD9CFE","13FF12","B811D5","3D4567","228D42","C4EEAD","2D160B","DD3BB1","12E454","713810","352A3A","ECD263","7373D4","251988","E09E2B","C12ED9","0157B1","9768FE","D90F43","A4090F","69ADD9","42470E","4F9926","619882","7E3AFB","E5754F","598511","A18D3C","FF8CFC","9FEBFF","E5FA3A","90F0CE","CEC128","D6AAF0","1BC12C","1BC945","A6213D","267718","2F40E5","FE89E0","508CFD","AD9C5F","A5DD79","D71D85","ECDC73","0BB903","340280","A21964"] }}
            /**/,
            "inner_hits": {
              "name": "analytics_hits_1",
              "size": 1,
              "_source": false
            }
            /**/
        }}
  ]}}}},
  "_source": false,
  "sort": [{ "timestamp_utc":  {"order": "desc"}}]
}

As you can see, I already have source disabled, and I also have size set to 1.

Running ES 6.3.2. I've already googled around and checked the forums. I also switched the compression codec from best_compression to default and this got me about a 20% boost in query performance, but still things are really slow here.

I have looked at the profiler output and it just tells me that the query is fast (and it is, if I remove inner_hits). It doesn't seem to tell me anything about the work required by inner_hits.

I know it will be hard to give me specifics as I can't give a reproducible test case here (as it probably requires lots of data in an index). But where should I look, and what parameters should I consider tweaking. Should I expect that including inner_hits will make the query response 20x slower?

Any hints, tips, and suggestions will be appreciated.

-Jason


(Jason Cornez) #2

This same query, on the same data is very fast on ES 5.6.9. I think there may be some problem here in the ES 6.x line.


(Jason Cornez) #3

I'm fairly sure there are some performance degradations in 6.x vs 5.6.x. This issue is certainly one. I've also seen that nested aggregations are much worse here. I'm not quite sure how to get anyone's attention. But this forum seems to not be the way...

-Jason


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.