Increase Elasticsearch maximum dimensions for sparse vectors

Using sparse vectors in elastic search has two dimensional limits. On the one hand, vectors should not have more than 1024 elements.

This can be solved, as seen in this question.

The second limit is not the number of elements in one sparse vector, but the dimension of the elements. For example, if we have 20 dimensions , we could have this two vectors:

v1 = {"1": 0.01, "7": 0.2, "0": 0.4}
v2 = {"19": 0.02, "11": 0.7}

with only 3 and 2 elements each. Note that keys range from 0 to 19, as strings.

These dictionary keys (sparse vectors are given as dictionaries to json) are integers encoded as strings, and cannot go beyond the funny number 65535.

I tried increasing the number of file descriptors, which is 65535 as well and looked like too much of a coincidence, but it didn't help.

Is it possible to bypass the limitation for sparse vectors? In my case the dimension of the sparse vectors is given from a vocabulary, so reducing it will harm results (I am not so worried about query performance, though.)

I am confused what you are trying to do because you linked articles on the number of fields and file descriptors, and these are not related to sparse vectors field.
Can you please describe your use case in more details.
Is your use case that you have a lot of fields in your json documents that you exceed the limitation on the number of fields?
Or is your use case that you really need to use sparse vectors (for example to index machine learning features)?

We have made a decision to deprecate and remove sparse_vector field type with a possibility to introduce it later when we see more use cases and a need for it.

Yes, I am using sparse vectors (fine in my current version), but the problem is the number of elements they point to, which are more than 65535. Obviously any one vector contains relatively few elements (less than 1000). One vector might be {"80000": 0.02, "11": 0.7} and this is not allowed.

I know file descriptors are unrelated, I only looked into them since the specific numeric limitation (also 65535 elements) hinted that both problems might somehow be related (it seems not).

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.