@Sean_Story Thanks for your response.
I too expect ELSER to have token limit of 512 while embedding. But i see its getting terminated before 512 tokens.
For example, see my sample code below, where it get truncated at 276 tokens. For different strings it get truncated at different values. If you can help me to understand what is going on here i would be thankful.
Here is my sample code:
# ELSER 2 embedding token size check
chunk="""
The tokens generated by ELSER must be indexed for use in the text_expansion query. However, it is not necessary to retain those terms in the document source. You can save disk space by using the source exclude mapping to remove the ELSER terms from the document source.
Reindex uses the document source to populate the destination index. Once the ELSER terms have been excluded from the source, they cannot be recovered through reindexing. Excluding the tokens from the source is a space-saving optimsation that should only be applied if you are certain that reindexing will not be required in the future! It’s important to carefully consider this trade-off and make sure that excluding the ELSER terms from the source aligns with your specific requirements and use case. Review the Disabling the _source field and Including / Excluding fields from _source sections carefully to learn more about the possible consequences of excluding the tokens from the _source.
Get the foundation for a full vector search experience and generative AI integration. Use a single platform to create, store, and search embeddings for dense retrieval and capture your unstructured data’s meaning and context — across text, images, videos, audio, geo-location, or other data. Elasticsearch goes further than other vector databases with a full suite of search capabilities: filters and faceting, document level security, on-prem or cloud deployment, and more.Get relevant semantic search out of the box across domains with the Elastic Learned Sparse Encoder model. Implement it easily with a single click when setting up your new search application. Query expansions with related keywords and relevance scores make the model easily understood and ready for prime time on any dataset — no fine-tuning required.Incorporate your proprietary, business-specific information with LLMs so that generative AI applications don’t have to simply rely on publicly trained data. Elasticsearch is your data source for highly relevant search results that enhances the quality of LLM output via context window. Integrate with generative AI or your preferred LLM using Elasticsearch’s APIs and plugins.Deliver generative AI experiences with better context for customers and employees. Elastic provides generative AI models with relevant search results from your data using retrieval augmented generation (RAG).When users query your application, Elastic provides relevant search results pulled from the data you have stored in Elasticsearch. These secure results, which contain proprietary context from your organization, get passed to the generative AI model to create more accurate responses for end-users.Create a generative AI experience that's tailored to your own business and end-user needs. Elastic connects your datastore — whether it's a database, knowledge base, or case history — with large language models like OpenAI ChatGPT, Google Bard, and Hugging Face. Have your own transformer model? Bring it and manage it within Elastic. Using Langchain to build your app? We can integrate with your preferred open source frameworks too.Use Elasticsearch with large language models (LLMs) to create powerful, new applications for your customers and employees. Tailor generative AI experiences to your business using real-time, proprietary data. Build cost-effective and secure AI apps that are accurate and relevant using Elastic’s vector database, out of the box semantic search, and transformer model flexibility. The future is possible today with Elastic.Review queues show you posts one at a time so that you can evaluate what, if any, action is needed.
"""
print ("The length of the paragraph is %s characters" % len (chunk))
docs2 = [{"text_field": chunk}]
### Check token size limit for embedding a string
ml_model=".elser_model_2"
chunk_vector = client.ml.infer_trained_model(model_id=ml_model, docs=docs2, )
print(chunk_vector['inference_results'][0])
print("Embedding size : {}".format(len(chunk_vector['inference_results'][0]['predicted_value'])))
if chunk_vector['inference_results'][0]['is_truncated']:
print(" **** We exceeded the model token limit ******* ")
else:
print(" **** We NOT exceeded the model token limit ******* ")
Here is the ouput
The length of the paragraph is 3625 characters
{'predicted_value': {'rein': 2.1909235, 'elastic': 2.0344632, 'token': 1.8880011, 'else': 1.7904589, 'expansion': 1.7321781, '##de': 1.6252898, 'rag': 1.6193763, '##r': 1.5961617, 'genera': 1.567829, 'document': 1.5452492, '##xing': 1.537078, 'll': 1.4878062, 'sparse': 1.4574037, '##x': 1.3799739, 'exclude': 1.3786075, 'source': 1.3693296, 'text': 1.3683709, '##code': 1.3578047, '##sea': 1
....
'certification': 0.026038108, '##d': 0.025774192, 'elimination': 0.025767686, 'html': 0.024397722, 'clicking': 0.024229601, 'scope': 0.021584367, 'rights': 0.018045416, 'managed': 0.017101327, 'log': 0.015965834, 'class': 0.008823808, 'knowledge': 0.006527294, '##ima': 0.0056298743, 'd': 0.004097282, '##ulation': 0.0025947972, 'future': 0.001865791}, 'is_truncated': True}
**Embedding size : 276**
** **** We exceeded the model token limit *********