Question on Semantic Search ELSER Model's Tokenization

Marina_Ten_Have · November 26, 2025, 10:48pm

Hi everyone,

I had two quick questions about ELSER’s tokenization behavior:

According to this Elasticsearch doc, both documents and queries have a 512-token limit for the ELSER model. Does this tokenization step happen every time a query is executed, even if I’m querying the same index on which I previously ran semantic search?
I’m also trying to understand how ELSER tokenizes natural language. Would using this Elasticsearch API give an accurate estimate of the token count? And does ELSER’s tokenization resemble LLM-style tokenization (e.g., ~4 characters ≈ 1 token), or is it fundamentally different?

Thanks so much in advance! Would greatly appreciate any thoughts/insights.

Topic		Replies	Views
Glossary of terms added to docs Elasticsearch	5	322	July 6, 2017
Elasticsearch and NLTK Elasticsearch	1	869	July 5, 2017
Elastic Search Tokenizer (for tf-idf) Elasticsearch	8	763	July 6, 2017
Embedding token size limit for ELSER2 model Elastic Search elastic-workplace-search	16	1862	February 22, 2024
Can Elastic run complex search using annotated tokens? Elasticsearch	10	5695	July 5, 2017