Profiling kNN search

I'm trying to profile slow kNN search as discussed here .

So I read the documentation here and set up this query in Postman:

{
  "profile": true,
  "knn": {
    "field": "title_vector",
    "query_vector": {{SEARCH_TEXT}},
    "k": 40,
    "num_candidates": 500,
    "boost": 0.8,
    "filter": {
        "term": {
          "_meta.seo_friendly": false
        }
      }
  }
}

After going through the profiling results I have 2 questions:

  1. All of the timing results are either in nano- or micro- seconds and still the whole query takes about 30 seconds, and it definitely does not add up to 30 seconds. So why I'm not seeing anything even remotely close to 30 seconds in the profiler results?
  2. The document on profiler mentions DFS statistics that can actually show how much time actual kNN search takes, but I'm not seeing this statistics in the results. How can I enable this?

@ruslaniv what version of Elasticsearch? The DFS statistics wasn't added until recently (8.6) and to profile KNN, you will probably need that information.

Ah, yes, we had 8.4. So we have upgraded the test server to 8.6 and here is the results:

"took": 32146,
  "timed_out": false,
  "_shards": {
	  "total": 1,
	  "successful": 1,
	  "skipped": 0,
	  "failed": 0
  },
  "hits": {...}
"profile": {
	"shards": [
		{
		"id": "[VyKbexZxTP24nHpcsNtS6w][proposals.proposals.vector_20221216][0]",
		"dfs": {
			"statistics": {
				"type": "statistics",
				"description": "collect term statistics",
				"time": "95.3micros",
				"time_in_nanos": 95396,
				"breakdown": {
					"term_statistics": 0,
					"collection_statistics": 0,
					"collection_statistics_count": 0,
					"create_weight": 58698,
					"term_statistics_count": 0,
					"rewrite_count": 0,
					"create_weight_count": 1,
					"rewrite": 0
				}
			},
			"knn": {
				"query": [
					{
						"type": "BoostQuery",
						"description": "(DocAndScore[500])^0.8",
						"time": "2ms",
						"time_in_nanos": 2094416,
						"breakdown": {
							"set_min_competitive_score_count": 0,
							"match_count": 0,
							"shallow_advance_count": 0,
							"set_min_competitive_score": 0,
							"next_doc": 597276,
							"match": 0,
							"next_doc_count": 500,
							"score_count": 500,
							"compute_max_score_count": 0,
							"compute_max_score": 0,
							"advance": 40897,
							"advance_count": 49,
							"score": 289088,
							"build_scorer_count": 98,
							"create_weight": 345387,
							"shallow_advance": 0,
							"create_weight_count": 1,
							"build_scorer": 821768
						}
					}
				],
				"rewrite_time": 32019008501,
				"collector": [
					{
						"name": "SimpleTopScoreDocCollector",
						"reason": "search_top_hits",
						"time": "1.3ms",
						"time_in_nanos": 1308447
					}
				]
			}
		},
		"searches": [
			{
				"query": [
					{
						"type": "KnnScoreDocQuery",
						"description": "ScoreAndDocQuery",
						"time": "843.2micros",
						"time_in_nanos": 843270,
						"breakdown": {
							"set_min_competitive_score_count": 0,
							"match_count": 0,
							"shallow_advance_count": 0,
							"set_min_competitive_score": 0,
							"next_doc": 153495,
							"match": 0,
							"next_doc_count": 40,
							"score_count": 40,
							"compute_max_score_count": 0,
							"compute_max_score": 0,
							"advance": 38398,
							"advance_count": 49,
							"score": 80297,
							"build_scorer_count": 98,
							"create_weight": 284389,
							"shallow_advance": 0,
							"create_weight_count": 1,
							"build_scorer": 286691
						}
					}
				],
				"rewrite_time": 6000,
				"collector": [
					{
						"name": "SimpleTopScoreDocCollector",
						"reason": "search_top_hits",
						"time": "591.1micros",
						"time_in_nanos": 591179
					}
				]
			}
		],
		"aggregations": [],
		"fetch": {
			"type": "fetch",
			"description": "",
			"time": "24ms",
			"time_in_nanos": 24071502,
			"breakdown": {
				"load_stored_fields": 18872699,
				"load_source": 290889,
				"load_stored_fields_count": 10,
				"next_reader_count": 9,
				"load_source_count": 10,
				"next_reader": 1781233
			},
			"debug": {
				"stored_fields": [
					"_id",
					"_routing",
					"_source"
				]
			},
			"children": [
				{
					"type": "FetchSourcePhase",
					"description": "",
					"time": "24.1micros",
					"time_in_nanos": 24100,
					"breakdown": {
						"process_count": 10,
						"process": 21500,
						"next_reader": 2600,
						"next_reader_count": 9
					},
					"debug": {
						"fast_path": 10
					}
				},
				{
					"type": "StoredFieldsPhase",
					"description": "",
					"time": "13.5micros",
					"time_in_nanos": 13599,
					"breakdown": {
						"process_count": 10,
						"process": 8099,
						"next_reader": 5500,
						"next_reader_count": 9
					}
				}
			]
		}
	}
]
}
}

So I guess this is the hotspot, where ES spends most of its time?

KNN spends most of its time in the rewrite. So, that is indeed the hot spot. Its where the KNN search occurs.

We do segment searches serially. So, comparing your two open KNN search tickets this is what I think is happening.

You are on a single node with a single shard. That single shard has 49 segments, each seems to be an OK size (at least a GB or so).

But, this then means, on a single node, you are exploring 49 different HNSW graphs.

In the future, we want to make KNN work in parallel on the same shard but with different segments, but right now, that doesn't happen.

I think you should try force-merging your test node to fewer segments. It doesn't have to be 1. 1 would be best, but it could take a while to complete.

So, I think:

Run it asynchronously (it will take a while), and set max_num_segments to something 10 or below. Again, 1 is best, but you would see significant improvements with 10 I think.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.