Profiling kNN search

ruslaniv · March 6, 2023, 10:26am

I'm trying to profile slow kNN search as discussed here .

So I read the documentation here and set up this query in Postman:

{
  "profile": true,
  "knn": {
    "field": "title_vector",
    "query_vector": {{SEARCH_TEXT}},
    "k": 40,
    "num_candidates": 500,
    "boost": 0.8,
    "filter": {
        "term": {
          "_meta.seo_friendly": false
        }
      }
  }
}

After going through the profiling results I have 2 questions:

All of the timing results are either in nano- or micro- seconds and still the whole query takes about 30 seconds, and it definitely does not add up to 30 seconds. So why I'm not seeing anything even remotely close to 30 seconds in the profiler results?
The document on profiler mentions DFS statistics that can actually show how much time actual kNN search takes, but I'm not seeing this statistics in the results. How can I enable this?

BenTrent · March 6, 2023, 1:39pm

@ruslaniv what version of Elasticsearch? The DFS statistics wasn't added until recently (8.6) and to profile KNN, you will probably need that information.

ruslaniv · March 7, 2023, 5:39am

Ah, yes, we had 8.4. So we have upgraded the test server to 8.6 and here is the results:

"took": 32146,
  "timed_out": false,
  "_shards": {
	  "total": 1,
	  "successful": 1,
	  "skipped": 0,
	  "failed": 0
  },
  "hits": {...}
"profile": {
	"shards": [
		{
		"id": "[VyKbexZxTP24nHpcsNtS6w][proposals.proposals.vector_20221216][0]",
		"dfs": {
			"statistics": {
				"type": "statistics",
				"description": "collect term statistics",
				"time": "95.3micros",
				"time_in_nanos": 95396,
				"breakdown": {
					"term_statistics": 0,
					"collection_statistics": 0,
					"collection_statistics_count": 0,
					"create_weight": 58698,
					"term_statistics_count": 0,
					"rewrite_count": 0,
					"create_weight_count": 1,
					"rewrite": 0
				}
			},
			"knn": {
				"query": [
					{
						"type": "BoostQuery",
						"description": "(DocAndScore[500])^0.8",
						"time": "2ms",
						"time_in_nanos": 2094416,
						"breakdown": {
							"set_min_competitive_score_count": 0,
							"match_count": 0,
							"shallow_advance_count": 0,
							"set_min_competitive_score": 0,
							"next_doc": 597276,
							"match": 0,
							"next_doc_count": 500,
							"score_count": 500,
							"compute_max_score_count": 0,
							"compute_max_score": 0,
							"advance": 40897,
							"advance_count": 49,
							"score": 289088,
							"build_scorer_count": 98,
							"create_weight": 345387,
							"shallow_advance": 0,
							"create_weight_count": 1,
							"build_scorer": 821768
						}
					}
				],
				"rewrite_time": 32019008501,
				"collector": [
					{
						"name": "SimpleTopScoreDocCollector",
						"reason": "search_top_hits",
						"time": "1.3ms",
						"time_in_nanos": 1308447
					}
				]
			}
		},
		"searches": [
			{
				"query": [
					{
						"type": "KnnScoreDocQuery",
						"description": "ScoreAndDocQuery",
						"time": "843.2micros",
						"time_in_nanos": 843270,
						"breakdown": {
							"set_min_competitive_score_count": 0,
							"match_count": 0,
							"shallow_advance_count": 0,
							"set_min_competitive_score": 0,
							"next_doc": 153495,
							"match": 0,
							"next_doc_count": 40,
							"score_count": 40,
							"compute_max_score_count": 0,
							"compute_max_score": 0,
							"advance": 38398,
							"advance_count": 49,
							"score": 80297,
							"build_scorer_count": 98,
							"create_weight": 284389,
							"shallow_advance": 0,
							"create_weight_count": 1,
							"build_scorer": 286691
						}
					}
				],
				"rewrite_time": 6000,
				"collector": [
					{
						"name": "SimpleTopScoreDocCollector",
						"reason": "search_top_hits",
						"time": "591.1micros",
						"time_in_nanos": 591179
					}
				]
			}
		],
		"aggregations": [],
		"fetch": {
			"type": "fetch",
			"description": "",
			"time": "24ms",
			"time_in_nanos": 24071502,
			"breakdown": {
				"load_stored_fields": 18872699,
				"load_source": 290889,
				"load_stored_fields_count": 10,
				"next_reader_count": 9,
				"load_source_count": 10,
				"next_reader": 1781233
			},
			"debug": {
				"stored_fields": [
					"_id",
					"_routing",
					"_source"
				]
			},
			"children": [
				{
					"type": "FetchSourcePhase",
					"description": "",
					"time": "24.1micros",
					"time_in_nanos": 24100,
					"breakdown": {
						"process_count": 10,
						"process": 21500,
						"next_reader": 2600,
						"next_reader_count": 9
					},
					"debug": {
						"fast_path": 10
					}
				},
				{
					"type": "StoredFieldsPhase",
					"description": "",
					"time": "13.5micros",
					"time_in_nanos": 13599,
					"breakdown": {
						"process_count": 10,
						"process": 8099,
						"next_reader": 5500,
						"next_reader_count": 9
					}
				}
			]
		}
	}
]
}
}

ruslaniv · March 7, 2023, 8:44am

So I guess this is the hotspot, where ES spends most of its time?

BenTrent · March 7, 2023, 1:33pm

KNN spends most of its time in the rewrite. So, that is indeed the hot spot. Its where the KNN search occurs.

We do segment searches serially. So, comparing your two open KNN search tickets this is what I think is happening.

You are on a single node with a single shard. That single shard has 49 segments, each seems to be an OK size (at least a GB or so).

But, this then means, on a single node, you are exploring 49 different HNSW graphs.

In the future, we want to make KNN work in parallel on the same shard but with different segments, but right now, that doesn't happen.

I think you should try force-merging your test node to fewer segments. It doesn't have to be 1. 1 would be best, but it could take a while to complete.

So, I think:

Run it asynchronously (it will take a while), and set max_num_segments to something 10 or below. Again, 1 is best, but you would see significant improvements with 10 I think.

system · April 4, 2023, 1:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
KNN search speed Elasticsearch vector-search	12	1842	April 20, 2023
Help optimizing slow knn query Elasticsearch vector-search	7	60	November 4, 2024
Kibana search profiler accuracy Elasticsearch	4	286	November 17, 2021
A question about the Profile API Elasticsearch	1	199	August 11, 2022
Where is the bottleneck of KNN retrieval speed? How to analyze? Elasticsearch vector-search	3	289	December 14, 2023

Profiling kNN search

Related topics