How are interger numbers treated in ElasticSearch?

jordane95 · February 21, 2022, 10:52am

Hey, I'm new to this. I have some questions about Elasticsearch.
How are interger numbers treated in this lib? For example, year, age, month...etc.
Can I build the index with BERT-tokenizer tokenized documents where each document is represented by its token id (an integer ranging from 0 to 3000). And convert the queries correspondly to do retrieval?

warkolm · February 21, 2022, 11:26pm

Welcome to our community!

Elasticsearch is not a library.

I guess that depends on the mapping - Field data types | Elasticsearch Guide [8.0] | Elastic - and then how they are queried.

I'm not familiar with this approach, but that terminology is not something that is native to Elasticsearch.

jordane95 · February 23, 2022, 4:12am

Thank you for your reply! Excuse me for my misuse of terminology. I'm only a freshman to NLP and IR.

To be more precious, I will examplify with a toy example. For example, if I want to index a document "I really like Elasticsearch" in string format, then the tokenizer may map the tokens in this document to the corresponding ID. Let's say, "1 39 32 380188802", where "I" use the the ID 1 in vocabulary.
If I query "Elasticsearch", it's similarily mapped into "380188802".

So, now the query and document are all mapped to its IDs representation, but still in string format before feed into elasticsearch. What I want to know is, how are this format of documentt ("1 39 32 380188802") indexed, is it splited by " " and tokenized to ["1", "39", "32", "380188802"] where each string format interger is treated as a word to index? Or there are more heuristics to tackle this type of input?

warkolm · February 23, 2022, 6:07am

It depends on how the field is mapped.

If it's a text, then it'll treat it as one string, if it's a keyword it'll tokenise on spaces. It might even be an array, given it's a bunch of numbers.
That's up to you to tell Elasticsearch how to handle.

system · March 23, 2022, 6:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does Elasticsearch indexes non-text fields Elasticsearch	5	874	September 25, 2022
Documents that only contain integers Elasticsearch	3	281	July 6, 2017
Any issue store integer values in text field Elasticsearch	7	2501	March 10, 2022
[Resolved]Interger field defined in index template is treated as string, it looks like the interger mapping doesn't take effect Elasticsearch	12	811	August 20, 2019
Some values in the field is String, some is Integer, though Mapping is Integer Elasticsearch	3	166	October 11, 2022

How are interger numbers treated in ElasticSearch?

Related topics