Poor Query Performance using nested document structure

We are building a search feature for our existing product . The document structure is like

{
	"_index" : "apr-2018-feed",
	"_type" : "product",
	"_id" : "8102637039f76d20a0adc1257c14ee08",
	"_source" : {
		"id" : "8102637039f76d20a0adc1257c14ee08",
		"field1" : "value",
		"field2" : 6495,
		"field3" : "",
		"field4" : "value",
		"field5" : 23922,
		"dateField" : "2018-03-02",
		"valueField" : 10000000,
		"clusters" : [{
				"clusterId" : 4919,
				"clusterName" : "XYZ",
				
				"innerClusters" : [{
						"innerClusterId" : 118760075,
						"field1" : "value",
						"field2" : 6495,
						"field3" : "",
						"field4" : "value",
						"field5" : 23922,
						"attributeStore1" : [{
								"name" : "attr1",
								"value" : "attrVal"
							}, {
								"name" : "attr2",
								"value" : "attrVal"
							}, {
								"name" : "attr3",
								"value" : "attrVal"
							}, {
								"name" : "attr4",
								"value" : "attrVal"
							}
						],
						"attributeStore2" : [{
								"name" : "attr5",
								"value" : "attrVal"
							}, {
								"name" : "attr6",
								"value" : "attrVal"
							}, {
								"name" : "attr7",
								"value" : "attrVal"
							}, {
								"name" : "attr8",
								"value" : "attrVal"
							}
						],
					},{
						"innerClusterId" : 118760076,
						"field1" : "value",
						"field2" : 6495,
						"field3" : "",
						"field4" : "value",
						"field5" : 23922,
						"attributeStore1" : [{
								"name" : "attr1",
								"value" : "attrVal"
							}, {
								"name" : "attr2",
								"value" : "attrVal"
							}, {
								"name" : "attr3",
								"value" : "attrVal"
							}, {
								"name" : "attr4",
								"value" : "attrVal"
							}
						],
						"attributeStore2" : [{
								"name" : "attr5",
								"value" : "attrVal"
							}, {
								"name" : "attr6",
								"value" : "attrVal"
							}, {
								"name" : "attr7",
								"value" : "attrVal"
							}, {
								"name" : "attr8",
								"value" : "attrVal"
							}
						],
					}
				]
			}
		]
	}
}

This is the document structure that I am using.

Document
|__
Clusters
|__
InnerCLusters
|__
AttrStore1
|__
AttrStore2

We are clustering documents based on document similarity.

We have around 17 million grouped/clustered documents and index size is 105.3 GB. Total documents as per ES is 298.8 million.

We have configured 2 data nodes m5.large (2 vCPU * 2 = 4 vCPU), ( with SSD storage) . Index with 4 shard (1 shard per CPU core) and 0 replica , segments merged (1 segment per shard)

ES configuration

bootstrap.memory_lock: true
indices.requests.cache.size: 30%
thread_pool.search.size: 50

Heap
-Xms4g
-Xmx4g

Also we did a match_all query which takes around 5 sec (with cache cleared)

Also we tried with larger instances with total of 16 vCPU and 120 GB of RAM for Elasticsearch but the performance was similar

How should we store the documents so that we query the documents under 500ms ?

1 Like

Can you share your exact query?

Ours is a complex query which we can explain if required.

Even the simple match all query is taking around 5 sec

    GET apr-2018-feed1v4/_search
    {
      "query": {
        "match_all": {}
      }
    }

Response
{
"took": 4716,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 16843022,
"max_score": 1,
..
}
}

Can you run it with profiling on?

I ran the profile query but due to network restrictions cannot attach file and site does not allow more than 7000 characters.

sharing the profiling info for 1 shard in 2 posts

    "id": "[KRJZ692RR26VQU9YNDhS5w][apr-2018-di-feed1v4][0]",
    "searches": [
      {
        "query": [
          {
            "type": "ConstantScoreQuery",
            "description": "ConstantScore(#*:* -_type:__*)",
            "time": "277625.2845ms",
            "time_in_nanos": 277625284467,
            "breakdown": {
              "score": 1819332110,
              "build_scorer_count": 2,
              "match_count": 37359599,
              "create_weight": 1127914,
              "next_doc": 167896909321,
              "match": 107828448875,
              "create_weight_count": 1,
              "next_doc_count": 37359600,
              "score_count": 2105306,
              "build_scorer": 2641739,
              "advance": 0,
              "advance_count": 0
            },
            "children": [
              {
                "type": "BooleanQuery",
                "description": "#*:* -_type:__*",
                "time": "138158.4556ms",
                "time_in_nanos": 138158455619,
                "breakdown": {
                  "score": 0,
                  "build_scorer_count": 2,
                  "match_count": 37359599,
                  "create_weight": 467519,
                  "next_doc": 101389480737,
                  "match": 36691674377,
                  "create_weight_count": 1,
                  "next_doc_count": 37359600,
                  "score_count": 0,
                  "build_scorer": 2113784,
                  "advance": 0,
                  "advance_count": 0
                },
                "children": [
                  {
                    "type": "MatchAllDocsQuery",
                    "description": "*:*",
                    "time": "32474.96375ms",
                    "time_in_nanos": 32474963749,
                    "breakdown": {
                      "score": 0,
                      "build_scorer_count": 2,
                      "match_count": 0,
                      "create_weight": 2480,
                      "next_doc": 32437595592,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 37359600,
                      "score_count": 0,
                      "build_scorer": 6074,
                      "advance": 0,
                      "advance_count": 0
                    }
                  },
                  {
                    "type": "MultiTermQueryConstantScoreWrapper",
                    "description": "_type:__*",
                    "time": "95148.66580ms",
                    "time_in_nanos": 95148665802,
                    "breakdown": {
                      "score": 0,
                      "build_scorer_count": 2,
                      "match_count": 0,
                      "create_weight": 2580,
                      "next_doc": 0,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "build_scorer": 1546845,
                      "advance": 95111862080,
                      "advance_count": 35254294
                    }
                  }
                ]
              }
            ]
          },
          {
            "type": "BooleanQuery",
            "description": "_type:__Clusters _type:__Clusters.InnerCLusters _type:__Clusters.InnerCLusters.AttrStore2 _type:__Clusters.InnerCLusters.AttrStore3 _type:__Clusters.InnerCLusters.AttrStore1",
            "time": "32783.88403ms",
            "time_in_nanos": 32783884034,
            "breakdown": {
              "score": 0,
              "build_scorer_count": 2,
              "match_count": 0,
              "create_weight": 65857,
              "next_doc": 0,
              "match": 0,
              "create_weight_count": 1,
              "next_doc_count": 0,
              "score_count": 0,
              "build_scorer": 1331208,
              "advance": 32747232672,
              "advance_count": 35254294
            },

Part 2

"children": [
                  {
                    "type": "TermQuery",
                    "description": "_type:__Clusters",
                    "time": "1850.491480ms",
                    "time_in_nanos": 1850491480,
                    "breakdown": {
                      "score": 0,
                      "build_scorer_count": 2,
                      "match_count": 0,
                      "create_weight": 4236,
                      "next_doc": 0,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "build_scorer": 53520,
                      "advance": 1848317232,
                      "advance_count": 2116489
                    }
                  },
                  {
                    "type": "TermQuery",
                    "description": "_type:__Clusters.InnerCLusters",
                    "time": "1857.463788ms",
                    "time_in_nanos": 1857463788,
                    "breakdown": {
                      "score": 0,
                      "build_scorer_count": 2,
                      "match_count": 0,
                      "create_weight": 2269,
                      "next_doc": 0,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "build_scorer": 8509,
                      "advance": 1855326035,
                      "advance_count": 2126972
                    }
                  },
                  {
                    "type": "TermQuery",
                    "description": "_type:__Clusters.InnerCLusters.AttrStore2",
                    "time": "17757.78522ms",
                    "time_in_nanos": 17757785215,
                    "breakdown": {
                      "score": 0,
                      "build_scorer_count": 2,
                      "match_count": 0,
                      "create_weight": 2201,
                      "next_doc": 0,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "build_scorer": 6878,
                      "advance": 17737491247,
                      "advance_count": 20284886
                    }
                  },
                  {
                    "type": "TermQuery",
                    "description": "_type:__Clusters.InnerCLusters.AttrStore3",
                    "time": "5027.549424ms",
                    "time_in_nanos": 5027549424,
                    "breakdown": {
                      "score": 0,
                      "build_scorer_count": 2,
                      "match_count": 0,
                      "create_weight": 2144,
                      "next_doc": 0,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "build_scorer": 7084,
                      "advance": 5021800219,
                      "advance_count": 5739974
                    }
                  },
                  {
                    "type": "TermQuery",
                    "description": "_type:__Clusters.InnerCLusters.AttrStore1",
                    "time": "4427.707954ms",
                    "time_in_nanos": 4427707954,
                    "breakdown": {
                      "score": 0,
                      "build_scorer_count": 2,
                      "match_count": 0,
                      "create_weight": 2279,
                      "next_doc": 0,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "build_scorer": 8043,
                      "advance": 4422711652,
                      "advance_count": 4985977
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 84186,
            "collector": [
              {
                "name": "CancellableCollector",
                "reason": "search_cancelled",
                "time": "5838.014395ms",
                "time_in_nanos": 5838014395,
                "children": [
                  {
                    "name": "SimpleTopScoreDocCollector",
                    "reason": "search_top_hits",
                    "time": "2053.911836ms",
                    "time_in_nanos": 2053911836
                  }
                ]
              }
            ]
          }
        ],
        "aggregations": []
      }

So you have a lot of nested docs here which is causing lot of internal "joins" at search time.
Few things I can think about:

  • First once the segments are loaded in memory (if you have enough memory left for the OS FS Cache), hopefully this will be much faster.
  • May be having more shards in that case with fewer documents per shard would help to reduce that time
  • Depending on your use case, don't use nested when not absolutely necessary.

But may be @jpountz has other ideas?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.