How are interger numbers treated in ElasticSearch?

jordane95 · February 23, 2022, 4:12am

Thank you for your reply! Excuse me for my misuse of terminology. I'm only a freshman to NLP and IR.

To be more precious, I will examplify with a toy example. For example, if I want to index a document "I really like Elasticsearch" in string format, then the tokenizer may map the tokens in this document to the corresponding ID. Let's say, "1 39 32 380188802", where "I" use the the ID 1 in vocabulary.
If I query "Elasticsearch", it's similarily mapped into "380188802".

So, now the query and document are all mapped to its IDs representation, but still in string format before feed into elasticsearch. What I want to know is, how are this format of documentt ("1 39 32 380188802") indexed, is it splited by " " and tokenized to ["1", "39", "32", "380188802"] where each string format interger is treated as a word to index? Or there are more heuristics to tackle this type of input?

Topic		Replies	Views
How does Elasticsearch indexes non-text fields Elasticsearch	5	732	September 25, 2022
Why integer 1 converted to 1.0 in a string field Elasticsearch	4	372	June 26, 2018
[Resolved]Interger field defined in index template is treated as string, it looks like the interger mapping doesn't take effect Elasticsearch	12	767	August 20, 2019
Changing the datatype of Tokens using analyzers Elasticsearch	1	359	June 20, 2018
Any issue store integer values in text field Elasticsearch	7	2321	March 10, 2022

How are interger numbers treated in ElasticSearch?

Related topics