Hi there! I've been playing a bit with the dense_vector field recently. I have a collection of thousands of vectors, 100 dimensions each. I created the index with the following config:
I use bulk upload, with a batch size of 64 vectors at the time, but I also tried uploading the vectors one by one. The problem that I face is that somewhere around batches 4040, 8080, etc. there is a massive slow down and the query takes more than a minute to be finished, but for the rest of the calls, the standard 10s timeout of the Python client is enough.
My question is: What may be the root cause of that issue? Is that a garbage collector process or maybe the index is being rebuilt? Or maybe I'm reaching some kind of segment size.
The way we index vectors is we don't build a graph on the fly, we are just buffering vectors.
But once enough vectors are buffered to create a segment (or if there is a refresh command), we create a segment, and that's where the main work of building a graph is starting which may take time. So the indexing itself is very fast, but creating a segment or refresh takes time.
So we would recommend to create segments less often. By default, if there are no searches, a shard is switched to "search_idle" state, and there are no refreshes, so segments will be created only when memory buffer is full ( or limit on translog is reached).
I am wondering if this current behaviour presents an issue for you, or just setting enough timeouts in your client would be enough.
query takes more than a minute to be finished, but for the rest of the calls, the standard 10s timeout of the Python client is enough
By "query" here do you mean a search query or indexing request?
Thanks for the reply. By query, I meant the indexing request, but right now everything seems to be clear.
That behaviour is not an issue, but quite surprising while I was monitoring the latency, and just wanted to clarify if that's normal.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.